Beruflich Dokumente
Kultur Dokumente
V800R009C10SPC200
Feature Description
Issue 01
Date 2018-05-04
NE20E-S2
Feature Description Contents
Contents
1 Feature Description....................................................................................................................... 1
1.1 Using the Packet Format Query Tool ............................................................................................................................ 2
1.2 VRPv8 Overview .......................................................................................................................................................... 3
1.2.1 About This Document ................................................................................................................................................ 3
1.2.2 VRP8 Overview ......................................................................................................................................................... 5
1.2.2.1 Introduction............................................................................................................................................................. 5
1.2.2.1.1 Introduction of VRP8 ........................................................................................................................................... 6
1.2.2.1.2 Development of the VRP ..................................................................................................................................... 7
1.2.2.2 Architecture............................................................................................................................................................. 7
1.2.2.2.1 VRP8 Componentization ..................................................................................................................................... 7
1.2.2.2.2 VRP8 Virtualized Hierarchy ................................................................................................................................ 8
1.2.2.2.3 VRP8 High Extensibility ..................................................................................................................................... 9
1.2.2.2.4 VRP8 Carrier-Class Management and Maintenance.......................................................................................... 11
1.2.2.2.5 Advantages of the VRP8 Architecture ............................................................................................................... 15
1.3 Basic Configurations .................................................................................................................................................. 16
1.3.1 About This Document .............................................................................................................................................. 16
1.3.2 TTY ......................................................................................................................................................................... 19
1.3.2.1 Introduction........................................................................................................................................................... 19
1.3.2.2 Principles .............................................................................................................................................................. 19
1.3.2.2.1 TTY.................................................................................................................................................................... 19
1.3.3 Telnet ....................................................................................................................................................................... 20
1.3.3.1 Introduction........................................................................................................................................................... 20
1.3.3.2 Principles .............................................................................................................................................................. 20
1.3.3.2.1 Telnet ................................................................................................................................................................. 20
1.3.3.3 Applications .......................................................................................................................................................... 23
1.3.3.3.1 Telnet ................................................................................................................................................................. 23
1.3.4 SSH .......................................................................................................................................................................... 24
1.3.4.1 Introduction........................................................................................................................................................... 24
1.3.4.2 Principles .............................................................................................................................................................. 24
1.3.4.2.1 SSH .................................................................................................................................................................... 24
1.3.4.3 Applications .......................................................................................................................................................... 28
1.3.4.3.1 Supporting STelnet ............................................................................................................................................ 28
1.3.4.3.2 Supporting SFTP ............................................................................................................................................... 28
Issue 01 (2018-05-04) ii
NE20E-S2
Feature Description Contents
Issue 01 (2018-05-04) iv
NE20E-S2
Feature Description Contents
Issue 01 (2018-05-04) v
NE20E-S2
Feature Description Contents
Issue 01 (2018-05-04) vi
NE20E-S2
Feature Description Contents
1.5.4.3.1 MPLS OAM Application in the IP RAN Layer 2 to Edge Scenario ................................................................ 303
1.5.4.3.2 Application of MPLS OAM in VPLS Networking .......................................................................................... 304
1.5.4.4 Terms and Abbreviations .................................................................................................................................... 305
1.5.5 MPLS-TP OAM ..................................................................................................................................................... 306
1.5.5.1 Introduction......................................................................................................................................................... 306
1.5.5.2 Principles ............................................................................................................................................................ 309
1.5.5.2.1 Basic Concepts................................................................................................................................................. 309
1.5.5.2.2 Continuity Check and Connectivity Verification ............................................................................................. 311
1.5.5.2.3 Packet Loss Measurement ............................................................................................................................... 312
1.5.5.2.4 Frame Delay Measurement .............................................................................................................................. 314
1.5.5.2.5 Remote Defect Indication ................................................................................................................................ 316
1.5.5.2.6 Loopback ......................................................................................................................................................... 317
1.5.5.3 Applications ........................................................................................................................................................ 318
1.5.5.3.1 MPLS-TP OAM Application in the IP RAN Layer 2 to Edge Scenario .......................................................... 318
1.5.5.3.2 Application of MPLS-TP OAM in VPLS Networking .................................................................................... 319
1.5.5.4 Terms and Abbreviations .................................................................................................................................... 320
1.5.6 VRRP ..................................................................................................................................................................... 321
1.5.6.1 Introduction......................................................................................................................................................... 321
1.5.6.2 Principles ............................................................................................................................................................ 325
1.5.6.2.1 Basic VRRP Concepts ..................................................................................................................................... 325
1.5.6.2.2 VRRP Packets .................................................................................................................................................. 326
1.5.6.2.3 VRRP Operating Principles ............................................................................................................................. 330
1.5.6.2.4 Basic VRRP Functions..................................................................................................................................... 334
1.5.6.2.5 mVRRP ............................................................................................................................................................ 337
1.5.6.2.6 Association Between VRRP and a VRRP-disabled Interface .......................................................................... 339
1.5.6.2.7 BFD for VRRP................................................................................................................................................. 340
1.5.6.2.8 VRRP Tracking EFM ....................................................................................................................................... 344
1.5.6.2.9 VRRP Tracking CFM....................................................................................................................................... 346
1.5.6.2.10 VRRP Association with NQA ........................................................................................................................ 349
1.5.6.2.11 Association Between a VRRP Backup Group and a Route ............................................................................ 351
1.5.6.2.12 Association Between Direct Routes and a VRRP Backup Group .................................................................. 354
1.5.6.2.13 Traffic Forwarding by a Backup Device ........................................................................................................ 356
1.5.6.2.14 Rapid VRRP Switchback ............................................................................................................................... 358
1.5.6.3 Applications ........................................................................................................................................................ 360
1.5.6.3.1 IPRAN Gateway Protection Solution .............................................................................................................. 360
1.5.6.4 Terms, Acronyms, and Abbreviations ................................................................................................................. 364
1.5.7 Ethernet OAM ....................................................................................................................................................... 364
1.5.7.1 Introduction......................................................................................................................................................... 364
1.5.7.2 EFM Principles ................................................................................................................................................... 368
1.5.7.2.1 Basic Concepts................................................................................................................................................. 368
1.5.7.2.2 Background ...................................................................................................................................................... 370
1.5.7.2.3 Basic Functions ................................................................................................................................................ 371
Issue 01 (2018-05-04) ix
NE20E-S2
Feature Description Contents
Issue 01 (2018-05-04) x
NE20E-S2
Feature Description Contents
Issue 01 (2018-05-04) xi
NE20E-S2
Feature Description Contents
Issue 01 (2018-05-04) xv
NE20E-S2
Feature Description Contents
Issue 01 (2018-05-04) xx
NE20E-S2
Feature Description Contents
1.14.8.2.1 Centralized Management of IP Hard-Pipe-based Leased Line Services on the NMS.................................. 1917
1.14.8.2.2 Interface-based Hard Pipe Bandwidth Reservation ..................................................................................... 1919
1.14.8.2.3 AC Interface Service Bandwidth Limitation ................................................................................................ 1920
1.14.8.2.4 Hard-Pipe-based TE LSP ............................................................................................................................. 1920
1.14.8.2.5 Hard-Pipe-based VLL/PWE3 ...................................................................................................................... 1921
1.14.8.2.6 Hard Pipe Reliability ................................................................................................................................... 1922
1.14.8.2.7 Hard Pipe Service Quality Monitoring ........................................................................................................ 1923
1.14.8.3 Applications .................................................................................................................................................... 1923
1.14.8.3.1 Hard-Pipe-based Enterprise Leased Line Application ................................................................................. 1923
1.14.8.3.2 Hard-Pipe-based Enterprise Leased Line Protection ................................................................................... 1923
1.14.8.3.3 Hard-Pipe-based Leased Line Services Implemented by Huawei and Non-Huawei Devices ..................... 1924
1.14.8.4 Terms, Acronyms, and Abbreviations ............................................................................................................. 1925
1.14.9 VPLS.................................................................................................................................................................. 1925
1.14.9.1 Introduction..................................................................................................................................................... 1925
1.14.9.2 Principles ........................................................................................................................................................ 1927
1.14.9.2.1 VPLS Description ........................................................................................................................................ 1927
1.14.9.2.2 VPLS Functions ........................................................................................................................................... 1934
1.14.9.2.3 LDP VPLS ................................................................................................................................................... 1937
1.14.9.2.4 BGP VPLS ................................................................................................................................................... 1941
1.14.9.2.5 HVPLS......................................................................................................................................................... 1943
1.14.9.2.6 BGP AD VPLS ............................................................................................................................................ 1944
1.14.9.2.7 Inter-AS VPLS ............................................................................................................................................. 1948
1.14.9.2.8 VPLS PW Redundancy ................................................................................................................................ 1950
1.14.9.2.9 Multicast VPLS............................................................................................................................................ 1953
1.14.9.2.10 VPLS Multi-homing .................................................................................................................................. 1958
1.14.9.3 Applications .................................................................................................................................................... 1960
1.14.9.3.1 Application of VPLS in Residential Services .............................................................................................. 1960
1.14.9.3.2 Application of VPLS in Enterprise Services ................................................................................................ 1962
1.14.9.3.3 VPLS PW Redundancy for Protecting Multicast Services .......................................................................... 1964
1.14.9.3.4 VPLS PW Redundancy for Protecting Unicast Services ............................................................................. 1968
1.14.9.3.5 Application of Multicast VPLS .................................................................................................................... 1971
1.14.9.3.6 VPWS Accessing VPLS .............................................................................................................................. 1973
1.14.9.3.7 VPLS Multi-homing Application ................................................................................................................. 1974
1.14.10 L2VPN Accessing L3VPN .............................................................................................................................. 1975
1.14.10.1 Introduction................................................................................................................................................... 1975
1.14.10.2 Principles ...................................................................................................................................................... 1976
1.14.10.2.1 Basic Concepts and Implementation .......................................................................................................... 1976
1.14.10.2.2 Classification of L2VPN Accessing L3VPN ............................................................................................. 1977
1.14.10.3 Applications .................................................................................................................................................. 1978
1.14.10.3.1 VPWS Accessing L3VPN .......................................................................................................................... 1978
1.14.10.3.2 VPLS Accessing L3VPN ........................................................................................................................... 1979
1.14.10.4 Terms, Acronyms, and Abbreviations ........................................................................................................... 1980
Issue 01 (2018-05-04) xl
NE20E-S2
Feature Description Contents
1 Feature Description
Issue 01 (2018-05-04) 1
NE20E-S2
Feature Description 1 Feature Description
1.23 References
Issue 01 (2018-05-04) 2
NE20E-S2
Feature Description 1 Feature Description
Related Version
The following table lists the product version related to this document.
Intended Audience
This document is intended for:
Network planning engineers
Commissioning engineers
Data configuration engineers
System maintenance engineers
Security Declaration
Encryption algorithm declaration
The encryption algorithms DES/3DES/SKIPJACK/RC2/RSA (RSA-1024 or
lower)/MD2/MD4/MD5 (in digital signature scenarios and password encryption)/SHA1
(in digital signature scenarios) have a low security, which may bring security risks. If
protocols allowed, using more secure encryption algorithms, such as AES/RSA
(RSA-2048 or higher)/SHA2/HMAC-SHA2 is recommended.
Password configuration declaration
− Do not set both the start and end characters of a password to "%^%#". This causes
the password to be displayed directly in the configuration file.
− To further improve device security, periodically change the password.
Personal data declaration
Your purchased products, services, or features may use users' some personal data during
service operation or fault locating. You must define user privacy policies in compliance
with local laws and take proper measures to fully protect personal data.
Feature declaration
Issue 01 (2018-05-04) 3
NE20E-S2
Feature Description 1 Feature Description
Special Declaration
This document serves only as a guide. The content is written based on device
information gathered under lab conditions. The content provided by this document is
intended to be taken as general guidance, and does not cover all scenarios. The content
provided by this document may be different from the information on user device
interfaces due to factors such as version upgrades and differences in device models,
board restrictions, and configuration files. The actual user device information takes
precedence over the content provided by this document. The preceding differences are
beyond the scope of this document.
The maximum values provided in this document are obtained in specific lab
environments (for example, only a certain type of board or protocol is configured on a
tested device). The actually obtained maximum values may be different from the
maximum values provided in this document due to factors such as differences in
hardware configurations and carried services.
Interface numbers used in this document are examples. Use the existing interface
numbers on devices for configuration.
The pictures of hardware in this document are for reference only.
Symbol Conventions
The symbols that may be found in this document are defined as follows.
Symbol Description
Indicates an imminently hazardous situation which, if not
avoided, will result in death or serious injury.
Issue 01 (2018-05-04) 4
NE20E-S2
Feature Description 1 Feature Description
Symbol Description
Indicates a potentially hazardous situation which, if not
avoided, could result in death or serious injury.
Change History
Updates between document issues are cumulative. Therefore, the latest document issue
contains all updates made in previous issues.
Changes in Issue 03 (2017-09-20)
This issue is the third official release. The software version of this issue is
V800R009C10SPC200.
Changes in Issue 02 (2017-07-30)
This issue is the second official release. The software version of this issue is
V800R009C10SPC100.
Changes in Issue 01 (2017-05-30)
This issue is the first official release. The software version of this issue is
V800R009C10.
Issue 01 (2018-05-04) 5
NE20E-S2
Feature Description 1 Feature Description
− TCP
− IPv4/IPv6 dual stack
− Diverse user link access techniques
− Unicast routing protocols
− Multiprotocol Label Switching (MPLS) protocols, including MPLS Label
Distribution Protocol (LDP) and MPLS traffic engineering (TE)
Value-added services include:
− User access control
− Security
− Firewall
− L3VPN (Layer 3 Virtual Private Network)
The network devices running the VRP are configured and managed on the following universal
management interfaces:
Command-line interface (CLI)
SNMP
Netconf
As a large-scale IP routing software package, the VRP has been developed based on industry
standards and has passed rigorous tests before being released. Huawei rigorously tests all
software versions to make sure that they comply with all the relevant standards before they
are released. Major features and specifications of the VRP satisfy industry standards,
including the standards defined by the Internet Engineering Task Force (IETF) and
International Telecommunication Union-Telecommunication Standardization Sector (ITU-T).
The VRP software platform has also been verified by the market as well. So far, the VRP has
been installed on more than 2,000,000 network devices. As IP technologies and hardware
develop, new VRP versions are released to provide higher performance, extensibility, and
reliability, and more value-added services.
Issue 01 (2018-05-04) 6
NE20E-S2
Feature Description 1 Feature Description
The VRP5 is a distributed network operating system and features high extensibility, reliability,
and performance. Currently, network devices running VRP5 are serving more than 50 carriers
worldwide. The VRP5 provides various features and its stability has withstood the market
test.
The VRP8 is a new-generation network operating system, which has a distributed,
multi-process, and component-based architecture. The VRP8 supports distributed applications
and virtualization techniques. It builds upon the hardware development trend and will meet
carriers' exploding service requirements for the next five to ten years.
1.2.2.2 Architecture
1.2.2.2.1 VRP8 Componentization
Componentization refers to the method of encapsulating associated functions and data into a
software module, which is instantiated to function as a basic unit of communication
Issue 01 (2018-05-04) 7
NE20E-S2
Feature Description 1 Feature Description
scheduling. The VRP8 architecture design is component-based. The entire system is divided
into multiple independent components that communicate through interfaces. One component
provides services for another component through an interface, and the served component does
not need to know how the serving component provided its services.
The component-based architecture design has the following advantages:
Components are replaceable.
A component can be replaced by another component if the substitute provides the same
functions and services as those of the replaced component. The new component can even
use a different programming language. This enables a user to upgrade or add VRP8
components.
Components are reusable.
High-quality software components can serve for a long time and are stored in the
software database. The VRP8 software can be customized to a product architecture that
is quite different from its original hardware platform.
Components are distributable.
VRP8 components are deployed in a distributed manner. Two relevant components are
deployed on different nodes and they can communicate with each other across networks.
Component distribution can be implemented without modifying components. Instead,
only the data of related deployment policies needs to be modified.
A PS can be divided into multiple VSs. A VS can be separately configured with services. VSs
share a line card's physical interfaces, hardware forwarding resources, and the processing
capability of the control plane.
Virtualization techniques provide the following functions:
Virtualized networks
VSs of a device are leased to enterprise users and other service providers (SPs), which
reduces CAPEX and OPE and provides a higher level of security and reliability.
Flat networks
Issue 01 (2018-05-04) 8
NE20E-S2
Feature Description 1 Feature Description
VSs allow one device to have the functionalities of multiple devices, such as provider (P)
and provider edge (PE) devices, which simplifies the network architecture and flattens
the network.
Multi-service over a network
A variety of services are separately deployed on various VSs. These VSs form a logical
multi-service network. VSs allow services to be independent of each other, which
improves security and reliability and reduces CAPEX and OPEX.
New service verification
After a device is divided into VSs, new services such as IPv6 services or video services
can be verified separately without affecting existing services. VSs carrying various
services form a logical network, which improves security and reliability.
Issue 01 (2018-05-04) 9
NE20E-S2
Feature Description 1 Feature Description
Figure 1-4 Improving performance and capacity extensibility through VRP8 distribution
On the VRP8, the data plane adopts a model-based forwarding technique. A mere change in
the forwarding model, not in code, allows a new function to be implemented or allows a
function change on the forwarding plane, enabling quick responses to carriers' demands.
To support various network interfaces, an IP network device usually supports various line card
types. The problem is that these cards historically needed to be replaced as technology
progressed and chips were replaced and/or updated. To help carriers maximize the return on
investment and prevent large-scale line card replacement, software needs to support forward
and backward compatibility of line cards. The VRP8 implements forward and backward
Issue 01 (2018-05-04) 10
NE20E-S2
Feature Description 1 Feature Description
compatibility on line cards over the standard driver framework with the help of software and
hardware decoupling techniques.
Configuration Management
As shown in Figure 1-7, the VRP8 management plane adopts a hierarchical architecture,
consisting of the following elements:
Configuration tools
Configuration information model
Configuration data
The VRP8 management plane provides the following functions:
Support for various existing configuration tools and more
Implementation of model-based configuration
Data verification and configuration rollback
Database-assistant configuration data recovery
Issue 01 (2018-05-04) 11
NE20E-S2
Feature Description 1 Feature Description
Fault Management
As shown in Figure 1-8, the VRP8 implements fault management based on service objects.
The VRP8 creates a service object relationship model to analyze the correlation between
alarms, filter out invalid alarms, and report root alarms, speeding up fault identification.
Issue 01 (2018-05-04) 12
NE20E-S2
Feature Description 1 Feature Description
Performance Management
As shown in Figure 1-9, the VRP8 provides a flexible performance management mechanism.
Information about an object to be monitored, including a description of the object and a
monitoring threshold, can be manually defined on a configuration interface. The configuration
data can then be delivered by the central database. The APP component collects statistics
about the configured object and sends them to a Perf Management server through a
performance management (PM) agent. After receiving the statistics, the Perf Management
server generates information about a fault based on the pre-defined object and monitoring
threshold and then sends this fault information to the network management system (NMS)
through the fault management center. Performance information can be viewed by running a
command or through the NMS.
Issue 01 (2018-05-04) 13
NE20E-S2
Feature Description 1 Feature Description
Plug-and-Play
As shown in Figure 1-10, VRP8 plug-and-play allows a great number of devices to be
deployed on a site at a time and to be managed and maintained in remote mode, reducing
OPEX.
Issue 01 (2018-05-04) 14
NE20E-S2
Feature Description 1 Feature Description
Issue 01 (2018-05-04) 15
NE20E-S2
Feature Description 1 Feature Description
Related Version
The following table lists the product version related to this document.
Intended Audience
This document is intended for:
Network planning engineers
Commissioning engineers
Data configuration engineers
Issue 01 (2018-05-04) 16
NE20E-S2
Feature Description 1 Feature Description
Security Declaration
Encryption algorithm declaration
The encryption algorithms DES/3DES/SKIPJACK/RC2/RSA (RSA-1024 or
lower)/MD2/MD4/MD5 (in digital signature scenarios and password encryption)/SHA1
(in digital signature scenarios) have a low security, which may bring security risks. If
protocols allowed, using more secure encryption algorithms, such as AES/RSA
(RSA-2048 or higher)/SHA2/HMAC-SHA2 is recommended.
Password configuration declaration
− Do not set both the start and end characters of a password to "%^%#". This causes
the password to be displayed directly in the configuration file.
− To further improve device security, periodically change the password.
Personal data declaration
Your purchased products, services, or features may use users' some personal data during
service operation or fault locating. You must define user privacy policies in compliance
with local laws and take proper measures to fully protect personal data.
Feature declaration
− The NetStream feature may be used to analyze the communication information of
terminal customers for network traffic statistics and management purposes. Before
enabling the NetStream feature, ensure that it is performed within the boundaries
permitted by applicable laws and regulations. Effective measures must be taken to
ensure that information is securely protected.
− The mirroring feature may be used to analyze the communication information of
terminal customers for a maintenance purpose. Before enabling the mirroring
function, ensure that it is performed within the boundaries permitted by applicable
laws and regulations. Effective measures must be taken to ensure that information is
securely protected.
− The packet header obtaining feature may be used to collect or store some
communication information about specific customers for transmission fault and
error detection purposes. Huawei cannot offer services to collect or store this
information unilaterally. Before enabling the function, ensure that it is performed
within the boundaries permitted by applicable laws and regulations. Effective
measures must be taken to ensure that information is securely protected.
Reliability design declaration
Network planning and site design must comply with reliability design principles and
provide device- and solution-level protection. Device-level protection includes planning
principles of dual-network and inter-board dual-link to avoid single point or single link
of failure. Solution-level protection refers to a fast convergence mechanism, such as FRR
and VRRP.
Special Declaration
This document serves only as a guide. The content is written based on device
information gathered under lab conditions. The content provided by this document is
intended to be taken as general guidance, and does not cover all scenarios. The content
provided by this document may be different from the information on user device
interfaces due to factors such as version upgrades and differences in device models,
board restrictions, and configuration files. The actual user device information takes
Issue 01 (2018-05-04) 17
NE20E-S2
Feature Description 1 Feature Description
precedence over the content provided by this document. The preceding differences are
beyond the scope of this document.
The maximum values provided in this document are obtained in specific lab
environments (for example, only a certain type of board or protocol is configured on a
tested device). The actually obtained maximum values may be different from the
maximum values provided in this document due to factors such as differences in
hardware configurations and carried services.
Interface numbers used in this document are examples. Use the existing interface
numbers on devices for configuration.
The pictures of hardware in this document are for reference only.
Symbol Conventions
The symbols that may be found in this document are defined as follows.
Symbol Description
Indicates an imminently hazardous situation which, if not
avoided, will result in death or serious injury.
Change History
Updates between document issues are cumulative. Therefore, the latest document issue
contains all updates made in previous issues.
Changes in Issue 03 (2017-09-20)
This issue is the third official release. The software version of this issue is
V800R009C10SPC200.
Changes in Issue 02 (2017-07-30)
This issue is the second official release. The software version of this issue is
V800R009C10SPC100.
Issue 01 (2018-05-04) 18
NE20E-S2
Feature Description 1 Feature Description
1.3.2 TTY
1.3.2.1 Introduction
Terminal type (TTY), also called terminal service, provides access interfaces and human
machine interfaces (HMIs) for you to configure routers. TTY supports the following ports:
Console port
Virtual type terminal (VTY) port
Routers support user login over console or VTY ports. You can use a console port to set user
interface parameters, such as the speed, databits, stopbits, and parity. You can also initiate a
Telnet or Secure Shell (SSH) session to log in to a VTY port.
1.3.2.2 Principles
1.3.2.2.1 TTY
User Management
You can configure, monitor, and maintain local or remote network devices only after
configuring user interfaces, user management, and terminal services. User interfaces provide
login venues, user management ensures login security, and terminal services provide login
protocols. Routers support user login over console ports.
User Interface
A user interface is presented in the form of a user interface view for you to log in to a router.
You can use user interfaces to set parameters on all physical and logical interfaces that work
in asynchronous and interactive modes, and manage, authenticate, and authorize login users.
Routers allow users to access user interfaces through console ports.
A console port is provided by IPU of a router. IPU provides one console port that conforms to
the EIA/TIA-232 standard. The console port is a data connection equipment (DCE) interface.
The serial port of a user terminal can be directly connected to a router's console port to
implement local configurations.
User Login
When a router starts for the first time, no user name or password is configured on it. However,
the router prompts you to configure a password during the first login. After you configure a
password for a router, you must enter the configured password before logging in to the router
through the console port.
Issue 01 (2018-05-04) 19
NE20E-S2
Feature Description 1 Feature Description
When a router is powered on for the first time, you must log in to the router through the console port,
which is a prerequisite for other login modes as well. For example, you can use Telnet to log in to a
router only after you use the console port to log in to the router and configure an IP address.
1.3.3 Telnet
1.3.3.1 Introduction
The Telecommunication Network Protocol (Telnet) is derived from ARPANET released in
1969. It is the earliest Internet application.
A Telnet connection is a Transmission Control Protocol (TCP) connection used to transmit
data with interspersed Telnet control information. Telnet uses the client/server model to
present an interactive interface that enables a terminal to remotely log in to a server. A user
can log in to one host and then use Telnet to remotely log in to and configure and manage
multiple hosts without having to connect each one to a terminal. Figure 1-11 shows the Telnet
client/server model.
In Figure 1-11:
Telnet uses TCP for transmission.
All Telnet echo information is displayed on the terminal.
The server directly interacts with the pseudo terminal.
The server and client transmit commands and data over the TCP connection.
The client logs in to the server.
1.3.3.2 Principles
1.3.3.2.1 Telnet
Telnet applies to any host or terminal. The client's operating system maps a terminal to a
network virtual terminal (NVT) regardless of the terminal's type. Then, the server maps the
Issue 01 (2018-05-04) 20
NE20E-S2
Feature Description 1 Feature Description
NVT into a supported terminal type. This mapping masks client and terminal types.
Communicating ends are assumed to be connected to the NVTs.
Telnet uses the symmetric client/server mode. Therefore, each end of a Telnet connection must have an
NVT.
Two communicating ends negotiate options by sending WILL, WONT, DO, and DONT
requests. The options are used to determine the content of the Telnet service and include the
echo information, command change character set, and line mode.
Requests in Telnet
Either communicating end can initiate a request. Table 1-1 describes requests in Telnet.
When the sender sends a WONT or DONT request, the receiver must grant the request.
When the sender sends a WILL or DO request, the receiver can grant or reject the
request.
− If the receiver grants the request, the option immediately takes effect.
− If the receiver rejects the request, the option does not take effect. The sender can
still retain the NVT function.
Option Negotiation
Option negotiation requires the following three items:
IAC
A WILL, DO, WONT, or DONT request
Issue 01 (2018-05-04) 21
NE20E-S2
Feature Description 1 Feature Description
Option ID
The following is an example of option negotiation.
The server wants to enable option 33 "remote traffic control," and the client grants the request.
The exchanged commands are as follows:
On the server: <IAC, WILL, 33>
On the client: <IAC, DO, 33>
Suboption Negotiation
In addition to an option ID, other information may be required. For example, if the receiver is
required to specify a terminal type, the receiver must respond with an American Standard
Code for Information Interchange (ASCII) string to identify the terminal type.
The format of suboption negotiation is as follows:
<IAC, SB, option ID, suboption content, IAC, SE>
A complete suboption negotiation process is as follows:
1. The sender asks to enable the option by sending a DO/WILL request carrying the option
ID.
2. The receiver grants the request by sending a WILL/DO request carrying the option ID.
Through the preceding steps, both communicating ends agree to enable the option.
3. Either communicating end sends the request carrying the suboption ID through the
suboption-begin (SB) command and ends the suboption negotiation through the
suboption-end (SE) command.
4. The other end responds to the suboption negotiation through the SB command, suboption
codes, and related negotiation information, and then ends the suboption.
5. The receiver responds with a DO/WILL message to grant the request.
If there is no other suboption to be negotiated, the current negotiation is complete.
Assume, for demonstration purposes, that the receiver grants the request from the sender.
In practice, the receiver can reject the request from the server at any time.
The sender can request the terminal type only when the option negotiation type is DO.
The sender can relay the actual terminal type only when the option negotiation type is WILL.
The terminal type cannot be sent automatically. It is sent only for responding to the request, that is, it
is sent in request-response mode.
The terminal type information is a case-insensitive NVT ASCII string.
Issue 01 (2018-05-04) 22
NE20E-S2
Feature Description 1 Feature Description
Operating Modes
Telnet supports the following operating modes:
Half-duplex
One character at a time
One line at a time
Line mode
Symmetry
Symmetry in the negotiation syntax allows either the client or server to request a particular
option as required, which optimizes the services provided by the other party. A terminal
protocol allows a terminal to interact with an application process on a host. It also allows
process-process and terminal-terminal interactions.
1.3.3.3 Applications
1.3.3.3.1 Telnet
Issue 01 (2018-05-04) 23
NE20E-S2
Feature Description 1 Feature Description
Telnet applies to remote login. You can use Telnet to configure, monitor, and maintain remote
or local devices.
As shown in Figure 1-13, you can use Telnet on Device A to remotely log in to Device B.
1.3.4 SSH
1.3.4.1 Introduction
Telnet access is not secure because there is no authentication method and the data transmitted
across Transmission Control Protocol (TCP) connections is in plaintext. As a result, the
system is vulnerable to denial of service (DoS), IP address spoofing, and route spoofing
attacks.
As network security becomes more and more critical, using Telnet and File Transfer Protocol
(FTP) to transmit passwords and data in plaintext proves to be more and more vulnerable.
Secure Shell (SSH) resolves this issue. SSH encrypts the transmitted data to provide networks
with security services and therefore ensures security during remote login.
SSH exchanges data using TCP. It builds a secure channel over TCP. In addition to the
standard port (port 22), SSH supports access from other service ports to prevent unauthorized
access.
SSH has three versions: SSH1.0, SSH1.5, and SSH2.0. The NE20E implements SSH2.0 that features
backward compatibility.
Unless specified otherwise, SSH in this document refers to SSH2.0.
1.3.4.2 Principles
1.3.4.2.1 SSH
SSH Client
The SSH client function allows you to establish SSH connections with a router that can
function as an SSH server or with a UNIX host. Figure 1-14 and Figure 1-15 show the setup
of SSH channels for a local area network (LAN) and a wide area network (WAN),
respectively.
Issue 01 (2018-05-04) 24
NE20E-S2
Feature Description 1 Feature Description
SFTP
SFTP is short for SSH FTP that is a secure FTP protocol. SFTP is on the basis of SSH. It
ensures that users can log in to a remote device securely for file management and transmission,
and enhances the security in data transmission. In addition, you can log in to a remote SSH
server from the device that functions as an SFTP client.
STelnet
STelnet is a secure Telnet protocol, it is based on SSH2.0. Unlike Telnet, SSH authenticates
clients and encrypts data in both directions to guarantee secure transmissions on a
conventional insecure network.
SCP
Secure Copy (SCP) is based on SSH2.0. It guarantees secure file transfer in the traditional
insecure network environment by authenticating the client and encrypting the transmitted data
by using stelnet service.
SCP uses Secure Shell (SSH) for data transfer and uses the same mechanisms for
authentication, thereby ensuring the authenticity and confidentiality of the data in transit. A
client can send (upload) files to a server. Client can also request files or directories from a
server (download). SCP runs over TCP port 22 by default.
Issue 01 (2018-05-04) 25
NE20E-S2
Feature Description 1 Feature Description
Unlike SFTP, SCP allows file uploading or downloading without user authentication and
public key assignment, and also supports file uploading or downloading in batches.
Only authorized clients can set up socket connections with the SSH server through the
non-standard port. The clients and server then negotiate an SSH version, algorithms, and
session keys. User authentication, session requests, and interactive sessions are performed
subsequently.
SSH can be applied on switched or edge devices across the network to implement secure user
access and management on the devices.
Issue 01 (2018-05-04) 26
NE20E-S2
Feature Description 1 Feature Description
Supports data encryption algorithms, such as Data Encryption Standard (DES), 3DES,
and Advanced Encryption Standard (AES).
Encrypts the data exchanged between the SSH client and the server, including the user
name and password. This encryption prevents the password from being intercepted.
SM2 elliptic curves cryptography (ECC) algorithm
The SM2 and RSA algorithms are based on the ECC and belong to the asymmetric
cryptography system. The differences between the ECC and RSA algorithms are as
follows:
− The RSA algorithm is based on large number factorization, which increases the key
length. And the long keys slow down the computing speed and complicate the key
storage and management.
− Based on discrete logarithm, the ECC algorithm is difficult to crack and is more
secure.
Compared with the RSA algorithm, the ECC algorithm shortens the key length while
ensuring the same security.
Compared with the RSA algorithm, the ECC algorithm secures the encryption with
short keys, which speeds up encryption. The ECC algorithm has the following
advantages:
− ECC algorithm provides same security with shorter key length than the RSA
algorithm.
− Features a shorter computing process and higher processing speed than the RSA
algorithm.
− Requires less storage space than the RSA algorithm does.
− Requires lower bandwidth than the RSA algorithm does.
To ensure high security, do not use the DES algorithm/3DES algorithm/RSA algorithm whose length is
less than 2048 digits as the authentication type for the SSH user and data encryption. You are advised to
use a securer ECC authentication algorithm for higher security.
Supporting ACL
The SSH server can use access control lists (ACLs) to limit SSH users' incoming and
outgoing call authorities. ACL prevents unauthorized users from setting up TCP connections
and entering the SSH negotiation phase, which improves SSH server access security.
Issue 01 (2018-05-04) 27
NE20E-S2
Feature Description 1 Feature Description
1.3.4.3 Applications
1.3.4.3.1 Supporting STelnet
STelnet is based on SSH. The client and server set up a secure connection through negotiation.
The client can then log in to the server through the secure Telnet service.
Issue 01 (2018-05-04) 28
NE20E-S2
Feature Description 1 Feature Description
Supports the NETCONF file transfer process and provides acknowledge for a file
transfer success or failure.
Figure 1-19 shows an SFTP application.
Issue 01 (2018-05-04) 29
NE20E-S2
Feature Description 1 Feature Description
For example, a directory contains multiple files and sub-directories. SCP can be used to
transfer all files in the directory in a batch without changing the hierarchical directory
structure.
Issue 01 (2018-05-04) 30
NE20E-S2
Feature Description 1 Feature Description
Only authorized clients can set up socket connections with the SSH server through the
non-standard port. The clients and server then negotiate an SSH version, algorithms, and
session keys. User authentication, session requests, and interactive sessions are performed
subsequently.
SSH can be applied on switched or edge devices across the network to implement secure user
access and management on the devices.
Issue 01 (2018-05-04) 31
NE20E-S2
Feature Description 1 Feature Description
Definition
The command line interface (CLI) is an interface through which you can interact with a router.
The system provides a series of commands that allow you to configure and manage the router.
Purpose
The CLI is a traditional configuration tool, which is available on most data communication
products. However, with the wider application of data communication products worldwide,
customers require a more available, flexible, and friendly CLI.
Carrier-class devices have strict requirements for system security. Users must pass the
Authentication, Authorization and Accounting (AAA) authentication before logging in to a
CLI or before running commands, which ensures that users can view and use only the
commands that match their rights.
1.3.5.2 Principles
1.3.5.2.1 Principles of the Command Line Interface
The CLI is a key configuration tool. After you log in to a router, a prompt is displayed,
indicating that you have accessed the CLI and can enter a command.
Issue 01 (2018-05-04) 32
NE20E-S2
Feature Description 1 Feature Description
The CLI parses commands and packets carrying configuration information. You can use the
CLI to configure and manage routers. The CLI also provides an online help function.
Issue 01 (2018-05-04) 33
NE20E-S2
Feature Description 1 Feature Description
− When you enter a command followed by a space and a question mark (?), the value
range and function of the parameter are listed if the position of the question mark (?)
is for a parameter.
To provide full help in command mode, the CLI undergoes the following phases:
a. Command receiving phase
The CLI receives and displays all characters you have entered. When you enter a
question mark (?), the CLI starts online help. If full help is required, the system
starts full help.
b. Command matching phase
The system compares the received command with commands in the current
command mode to search for a matching command.
If a matching command exists, the system matches commands with your rights
and displays all commands you can use.
If a matching command does not exist, the system informs you that the
command is invalid and waits for a new command.
c. Command help phase
The system searches the configurable commands for possible elements in the
question mark (?) position.
If the entered command is complete, cr is displayed.
If the entered command is incomplete, possible command elements and their
description are displayed.
Partial help
− When you enter a string followed by a question mark (?), the system lists all
keywords that start with the string.
− When you enter a command followed by a question mark (?):
If the position of the question mark (?) is for a keyword, all keywords in the
command starting with the string are listed.
If the position of the question mark (?) is for a parameter and the parameter is
valid, information about all the parameters starting with the string is listed,
including the value range.
If the position of the question mark (?) is for a parameter but the parameter is
invalid, the CLI informs you that the input is incorrect.
To provide partial help in specific command mode, the CLI undergoes the following
phases:
a. Command receiving phase
The CLI receives and displays all characters you have entered. When you enter a
question mark (?), the CLI starts online help. If partial help is required, the system
starts partial help.
b. Command matching phase
The system compares the received command with commands in the current
command mode to search for a matching command.
If a matching command exists, the system matches commands with your rights
and displays all commands you can use.
If a matching command does not exist, the system informs you that the
command is invalid and waits for a new command.
c. Command help phase
Issue 01 (2018-05-04) 34
NE20E-S2
Feature Description 1 Feature Description
The system searches configurable commands for possible command elements in the
position of a question mark (?) and displays possible command elements.
Tab help
Tab help is an application of partial help, which provides help only for keywords. The
system does not display the description of a keyword.
You can enter the first letters of a keyword in a command and press Tab.
− If what you have entered identifies a unique keyword, the complete keyword is
displayed.
− If what you have entered does not identify a unique keyword, you can press Tab
repeatedly to view the matching keywords and select the desired one.
− If what you have entered does not match any command element, the system does
not modify the input and just displays what you have entered.
− If what you have entered is not a keyword in the command, the system does not
modify the input and just displays what you have entered.
The CLI also provides dynamic help for querying the database and script. If parameters
in a command support dynamic help and you enter the first letters of a parameter in the
command and press Tab, the following situations occur:
− If what you have entered identifies a unique parameter, the complete parameter is
displayed.
− If what you have entered does not identify a unique parameter, you can press Tab
repeatedly to view the matching parameters and select the desired one.
Different terminal software defines shortcut keys differently. Therefore, the shortcut keys on your
terminal may be different from those listed here.
1.3.5.3 Applications
None
Issue 01 (2018-05-04) 35
NE20E-S2
Feature Description 1 Feature Description
1.3.6 SSL
1.3.6.1 Introduction
Definition
The Secure Sockets Layer (SSL) protocol is a cryptographic protocol that provides
communication security over the Internet. It allows a client and a server to communicate in a
way designed to prevent eavesdropping by authenticating the server or the client.
Purpose
SSL and application layer protocols work independently. Connections of application layer
protocols, such as Syslog, can be established based on SSL handshakes. Before a client and a
server use an application layer protocol to communicate, SSL is used to determine
cryptography, negotiate a key, and authenticate the server. Data that is then transmitted using
the application layer protocol between the client and the server will be encrypted, thereby
protecting privacy.
Benefits
SSL offers the following benefits:
Provides secure network transmission. SSL uses data encryption, authentication, and
message integrity check to ensure secure data transmission over the network.
Supports various application layer protocols. SSL is originally designed for securing
World Wide Web traffic. As SSL functions between the application and transport layers,
it secures data transmission for any application layer protocol based on TCP connections.
Achieves easy deployment.
1.3.6.2 Principles
1.3.6.2.1 SSL
Working Process
SSL protocol structure
As shown in Figure 1-25, SSL functions between the application and transport layers. It
secures data transmission for any application layer protocol based on TCP connections.
SSL is divided into two layers: lower layer with the SSL record protocol and upper layer
with the SSL handshake protocol, SSL change cipher spec protocol, and SSL alert
protocol.
Issue 01 (2018-05-04) 36
NE20E-S2
Feature Description 1 Feature Description
− SSL record protocol: divides upper-layer information blocks into records, computes
and adds message authentication codes (MACs), encrypts records, and sends them
to the receiver.
− SSL handshake protocol: negotiates a cipher suite including a symmetric encryption
algorithm, a key exchange algorithm, and a MAC algorithm, exchanges a shared
key securely between a server and a client, and authenticates the server and client.
The client and server establish a session using the SSL handshake protocol to
negotiate session parameters including the session identifier, peer certificate, cipher
suite, and master secret.
− SSL change cipher spec protocol: used by the client and server to send a
ChangeCipherSpec message to notify the receiver that subsequent records will be
protected under the newly negotiated cipher suite and key.
− SSL alert protocol: allows one end to report alerts to the other. An alert message
conveys the alert severity and description.
SSL handshake process
The client and server negotiate session parameters during the SSL handshake process to
establish a session. Session parameters mainly include the session identifier, peer
certificate, cipher suite, and master secret. The master secret and cipher suite are used to
compute a MAC and encrypt data to be transmitted in this session.
The SSL handshake process varies according to the real-world situations. Handshake
processes in three situations are described as follows:
− SSL handshake process in which only the server is authenticated
Issue 01 (2018-05-04) 37
NE20E-S2
Feature Description 1 Feature Description
Figure 1-26 SSL handshake process in which only the server is authenticated
As shown inFigure 1-26, only the SSL server, not the SSL client, needs to be
authenticated. The SSL handshake process is as follows:
i. The SSL client sends a ClientHello message specifying the supported SSL
protocol version and cipher suite to the SSL server.
ii. The server responds with a ServerHello message containing the protocol
version and cipher suite chosen from the choices offered by the client. If the
server allows the client to reuse this session in the future, the server sends a
ServerHello message carrying a session ID to the client.
iii. The server sends a Certificate message carrying its digital certificate with its
public key to the client.
iv. The server sends a ServerHelloDone message, indicating that the SSL protocol
version and cipher suite negotiation finishes and key information exchange
starts.
v. After verifying the digital certificate of the server, the client responds with a
ClientKeyExchange message carrying a randomly generated key (called the
master secret), which is encrypted using the public key of the server certificate.
vi. The client sends a ChangeCipherSpec message to notify the server that every
subsequent message will be encrypted and a MAC will be computed based on
the negotiated key and cipher suite.
vii. The client computes a hash for all the previous handshake messages except the
ChangeCipherSpec message, uses the negotiated key and cipher suite to
process the hash, and sends a Finished message containing the hash and MAC
Issue 01 (2018-05-04) 38
NE20E-S2
Feature Description 1 Feature Description
to the server. The server computes a hash in the same way, decrypts the
received Finished message, and verifies the hash and MAC. If the verification
succeeds, the key and cipher suite negotiation is successful.
viii. The server sends a ChangeCipherSpec message to notify the client that
subsequent messages will be encrypted and a MAC will be computed based on
the negotiated key and cipher suite.
ix. The server computes a hash for all the previous handshake messages, uses the
negotiated key and cipher suite to process the hash, and sends a Finished
message containing the hash and MAC to the client. The client computes a
hash in the same way, decrypts the received Finished message, and verifies the
hash and MAC. If the verification succeeds, the key and cipher suite
negotiation is successful.
After receiving the Finished message from the server, if the client successfully
decrypts the message, the client checks whether the server is the owner of the
digital certificate. Only the SSL server that has a specified private key can decrypt
the ClientKeyExchange message to obtain the master secret. In this process, the
client authenticates the server.
The ChangeCipherSpec message is based on the SSL change cipher spec protocol, and other
messages exchanged in the handshake process are based on the SSL handshake protocol.
Computing a hash means that a hash algorithm (MD5 or SHA) is used to convert an arbitrary-length
message into a fixed-length message.
Issue 01 (2018-05-04) 39
NE20E-S2
Feature Description 1 Feature Description
Whether to authenticate the SSL client is determined by the SSL server. As shown
by blue arrows in Figure 1-27, if the server needs to authenticate the client, the
following operations are required in addition to the SSL handshake process in
which the client authenticates the server:
i. The server sends a CertificateRequest message to request the client to send its
certificate to the server.
ii. The client sends a Certificate message carrying its certificate and public key to
the server. After receiving the message, the server verifies the validity of the
certificate.
iii. The client computes a hash for the master secret over handshake messages,
encrypts the hash using its private key, and then sends a CertificateVerify
message to the server.
iv. The server computes a hash for the master secret over handshake messages,
decrypts the received CertificateVerify message using the public key in the
client's certificate, and compares the decrypted result with the computed hash.
If the two values are the same, client authentication succeeds.
− SSL handshake process for resuming a session
Issue 01 (2018-05-04) 40
NE20E-S2
Feature Description 1 Feature Description
Security Mechanism
Connection privacy
Issue 01 (2018-05-04) 41
NE20E-S2
Feature Description 1 Feature Description
SSL uses symmetric cryptography to encrypt data to be transmitted and the key exchange
algorithm Rivest Shamir and Adleman (RSA), which is one of asymmetric algorithms, to
encrypt the key used by the symmetric cryptography.
To ensure high security, do not use the RSA key pair whose length is less than 2048 digits.
Identity authentication
Digital-signed certificates are used to authenticate a server and a client that attempt to
communicate with each other. Authenticating the client identity is optional. The SSL
server and client use the mechanism provided by the Public Key Infrastructure (PKI) to
apply to a CA for a certificate.
Message integrity
A keyed MAC is used to verify message integrity during transmission.
A MAC algorithm computes a key and arbitrary-length data to output a MAC.
− A message sender uses a MAC algorithm and a key to compute a MAC and adds it
to the end of the message before sending the message to the receiver.
− The receiver uses the same key and MAC algorithm to compute a MAC and
compares the computed MAC with the MAC in the received message.
If the two MACs are the same, the message has not been tampered during transmission.
If the two MACs are different, the message has been tampered during transmission, and
the receiver will discard this message.
1.3.6.3 Applications
Currently, only DCN and Syslog support SSL-based encryption.
1.3.6.3.1 SSL
SSL authenticates the client and server and encrypts data transmitted between the two parties,
which improves network security.
Some traditional protocols do not have a security mechanism. As a result, data is transmitted
in plaintext. To improve security, configure SSL on the clients and server. SSL's data
encryption, identity authentication, and message integrity check mechanisms ensures security
of data transmission.
Issue 01 (2018-05-04) 42
NE20E-S2
Feature Description 1 Feature Description
On the DCN network shown in Figure 1-29, an SSL policy is configured on and a trusted-CA
file is loaded to the GNE and NMS to verify the identity of the certificate owner, sign a digital
certificate to prevent eavesdropping and tampering, and manage the certificate and key. After
the GNE and NMS are authenticated and a connection is established between them, data can
be encrypted and transmitted between them.
1.3.7 VFM
1.3.7.1 Introduction
Definition
Virtual File Management (VFM) is an interface the system provides for you to manage files.
Purpose
VFM can manage storage devices, directories, and files.
Directory management: VFM allows you to save files in logical hierarchies, query the current
working directory, change a directory, view directory or file information, and create or delete
a directory.
File management: VFM allows you to query, copy, rename, move, delete, and restore files.
Issue 01 (2018-05-04) 43
NE20E-S2
Feature Description 1 Feature Description
1.3.8.1 FTP
1.3.8.1.1 Introduction
When two hosts run different operating systems and use different file structures and character
sets, you can use File Transfer Protocol (FTP) to copy files from one host to the other.
Issue 01 (2018-05-04) 44
NE20E-S2
Feature Description 1 Feature Description
1.3.8.1.3 Principles
1.3.8.1.3.1 FTP
The File Transfer Protocol (FTP), a file transfer standard on the Internet, runs at the
application layer in the TCP/IP protocol suite. FTP is used to transfer files between local and
remote hosts, typically for version upgrades, log downloads, file transfers, and configuration
savings. FTP is implemented based on the file system.
FTP uses the client/server architecture, as shown in Figure 1-31.
Issue 01 (2018-05-04) 45
NE20E-S2
Feature Description 1 Feature Description
FTP provides common file operation commands to help you manage the file system, including
file transfer between hosts. You can use an FTP client program outside a router to upload or
download files and access directories on the router. You can also run an FTP client program
on a router to transfer files to other devices or to the FTP server on the router.
FTP Connections
FTP is a standard application protocol based on the TCP/IP protocol suite. It is used to
transfer files between local clients and remote servers. FTP uses two TCP connections to copy
a file from one system to another. The TCP connections are usually established in
client-server mode, one for control (the server port number is 21) and the other for data
transmission (the server port number is 20).
Control connection
A control connection is set up between the FTP client and FTP server.
The control connection always waits for communication between the client and server.
Commands are sent from the client to the server over this connection. The server
responds to the client after receiving the commands.
Data connection
The server uses port 20 to provide a data connection. The server can either set up or
terminate a data connection. When the client sends files in streams to the server, only the
client can terminate the data connection.
FTP supports file transfer in stream mode. The end of each file is indicated by end of file
(EOF). Therefore, new data connections must be set up for each file transfer or directory
list. When a file is transferred between the client and server, a data connection is set up.
Figure 1-32 shows the process of FTP file transfer.
Issue 01 (2018-05-04) 46
NE20E-S2
Feature Description 1 Feature Description
1. The server passively enables port 21 to wait to set up a control connection to the client.
2. The client actively enables a temporary port to send a request for setting up a connection
to the server.
3. After the server receives the request, a control connection is set up between the
temporary port on the client and port 21 on the server.
4. The client sends a command for setting up a data connection to the server.
5. The client chooses a temporary port for the data connection and uses the port command
to send the port number to the server over the control connection.
6. The server actively enables port 20 to send a request for setting up a data connection.
7. After the client receives the request, a data connection is set up between the temporary
port on the client and port 20 on the server.
Figure 1-33 shows the FTP connection establishment process. In this example, the FTP client
uses temporary port 2345 to establish a control connection and temporary port 2346 to
establish a data connection. The two ports are connected to ports 21 and 20 of the FTP server,
respectively.
Issue 01 (2018-05-04) 47
NE20E-S2
Feature Description 1 Feature Description
1.3.8.1.4 Applications
FTP applications are as follows:
A Device functions as an FTP client.
A Device functions as an FTP server.
Issue 01 (2018-05-04) 48
NE20E-S2
Feature Description 1 Feature Description
1.3.8.2 TFTP
This section describes basic concepts and principles of the Trivial File Transfer Protocol
(TFTP) and its applications on Huawei devices.
1.3.8.2.1 Introduction
The Trivial File Transfer Protocol (TFTP) is a User Datagram Protocol (UDP) and uses port
69.
In TFTP, the client sends a request to the server to read a file, write a file, or establish a
connection. A file is transmitted in a fixed length of 512 bytes. A data packet of less than 512
bytes signifies the file transfer termination. Each data packet contains one data block, which
helps the sender to resend the packet if the data packet is lost during transmission.
Issue 01 (2018-05-04) 49
NE20E-S2
Feature Description 1 Feature Description
Compared with FTP, TFTP does not require complex port interactions or access or
authentication control. TFTP applies when no complex interactions exist between the client
and server. For example, you can use TFTP to obtain the system memory image when the
system starts.
1.3.8.2.2 Principles
1.3.8.2.2.1 TFTP
Message Types
A TFTP packet header contains a two-byte operation code, with values defined as follows:
Read request (RRQ): indicates a read request.
Write request (WRQ): indicates a write request.
Data (DATA): indicates data packets.
Acknowledgment (ACK): indicates a positive reply packet.
Error (ERROR): indicates error packets.
Figure 1-36 shows a TFTP packet header.
Issue 01 (2018-05-04) 50
NE20E-S2
Feature Description 1 Feature Description
Transfer Modes
TFTP supports the following transfer modes:
Binary mode: used for program file transfers
ASCII mode: used for text file transfers
HUAWEI NE20E-S2 can function only as a TFTP client and transmit files in binary mode.
1.3.8.2.3 Applications
Issue 01 (2018-05-04) 51
NE20E-S2
Feature Description 1 Feature Description
Definition
Configuration: a series of command operations performed on the system to meet service
requirements. These operations still take effect after the system restarts.
Configuration file: a file used to save configurations. You can use a configuration file to
view configuration information. You can also upload a device's configuration file to
other devices for batch management.
A configuration file saves command lines in a text format. (Non-default values of the
command parameters are saved in the file.) Commands can be organized into a basic
command view framework. The commands in the same command view can form a
section. Empty or comment lines can be used to separate different sections. The line
beginning with "#" is a comment line.
Configuration management: a function for managing configurations and configuration
files using a series of commands.
A storage medium can save multiple configuration files. If the location of a device on the
network changes, its configurations need to be modified. To avoid reconfiguring the
device, specify a configuration file for the next startup. The device restarts with new
configurations to adapt to its new environment.
Issue 01 (2018-05-04) 52
NE20E-S2
Feature Description 1 Feature Description
Purpose
Configuration management allows you to lock, preview, and discard configurations, save the
configuration file used at the current startup, and set the configuration file to be loaded at the
next startup of the system.
Benefits
Configuration management offers the following benefits:
Improved efficiency by configuring services in batches
Improved reliability by correcting incorrect configurations
Improved security by minimizing the configuration impact on services
1.3.9.2 Principles
1.3.9.2.1 Two-Phase Validation Mode
Basic Principles
In two-phase validation mode, the system configuration process is divided into two phases.
The actual configuration takes effect after the two phases are complete. Figure 1-39 shows the
two phases of the system configuration process.
1. In the first phase, a user enters configuration commands. The system checks the data
type, user level, and object to be configured, and checks whether there are repeated
configurations. If syntax or semantic errors are found in the command line, the system
displays a message on the terminal to inform the user of the error and cause.
2. In the second phase, the user commits the configuration. The system then enters the
configuration commitment phase and commits the configuration in the candidate
database to the running database.
− If the configuration takes effect, the system adds it to the running database.
− If the configuration fails, the system informs the user that the configuration is
incorrect. The user can enter the command line again or change the configuration.
In two-phase validation mode, if a configuration has not been committed, the symbol "*" is displayed in
the corresponding view (except the user view). If all configurations have been configured, the symbol
"~" is displayed in the corresponding view (except the user view).
Issue 01 (2018-05-04) 53
NE20E-S2
Feature Description 1 Feature Description
Validity Check
After users enter the system view, the system assigns each user a candidate database. Users
perform configuration operations in their candidate databases, and the system checks the
validity of each user's configurations.
In two-phase validation mode, the system checks configuration validity and displays error
messages. The system checks the validity of the following configuration items:
Repeated configuration
The system checks whether configurations in the candidate databases are identical to
those in the running database.
− If configurations in the candidate databases are identical to those in the running
database, the system does not commit the configuration to the running database and
displays repeated configuration commands.
− If configurations in the candidate databases are different from those in the running
database, the system commits the configuration to the running database.
Data type
Commands available for each user level
Existence of the object to be configured
Issue 01 (2018-05-04) 54
NE20E-S2
Feature Description 1 Feature Description
Benefits
The two-phase validation mode offers the following benefits:
Allows several service configurations to take effect as a whole.
Allows users to preview configurations in the candidate database.
Clears configurations that do not take effect if an error occurs or the configuration does
not meet expectations.
Minimizes the impact of configuration procedures on the existing services.
Basic Concepts
Configuration: a set of specifications and parameters about services or physical resources.
These specifications and parameters are visible to and can be modified by users.
Configuration operation: a series of actions taken to meet service requirements, such as
adding, deleting, or modifying the system configurations.
Configuration rollback point: Once a user commits a configuration, the system
automatically generates a configuration rollback point and saves the difference between
the current configuration and the historical configuration at this configuration rollback
point.
Principles
As shown in Figure 1-41, a user committed configurations N times. Rollback point N
indicates the most recent configuration the user committed. The configuration rollback
procedure is as follows:
1. The user determines to roll the system configuration back to rollback point X based on
the comparison between the historical and current configurations.
2. After the user performs the configuration rollback operation, the system rolls back to the
historical state at rollback point X and generates a new rollback point N+1, which is
specially marked.
Configurations at rollback points N+1 and X are identical.
Issue 01 (2018-05-04) 55
NE20E-S2
Feature Description 1 Feature Description
Usage Scenario
Users can check the system running state after committing system configurations. If a fault or
an unexpected result (such as service overload, service conflict, or insufficient memory
resources) derived from misoperations is detected during the check, the system configurations
must roll back to a previous version. The system allows users to delete or modify the system
configurations only one by one.
Configuration rollback addresses this issue by allowing users to restore the original
configurations in batches.
The system automatically records configuration changes each time a change is made.
Users can specify the historical state to which the system configurations are expected to
roll back based on the configuration change history.
For example, a user has committed four configurations and four consecutive rollback points
(A, B, C, and D) are generated. If an error is found in configurations committed at rollback
point B, configuration rollback allows the system to roll back to the configurations at rollback
point A.
Configuration rollback significantly improves maintenance efficiency, reduces maintenance
costs, and minimizes error risks when configurations are manually modified one by one.
Benefits
Configuration rollback brings significant benefits for users in terms of configuration security
and system maintenance.
Minimizes impact of mistakes caused by misoperations. For example, if a user
mistakenly runs the undo bgp command, Border Gateway Protocol (BGP)-related
configurations (such as peer configurations) are deleted. Configuration rollback allows
the system to roll back configurations to what they were before the user ran the undo
bgp command.
Facilitates feature testing: When a user is testing a feature, the system generates only one
rollback point if all the feature-related configurations are committed at the same time.
Before the user tests another feature, configuration rollback allows the system to roll
back configurations to what they were before the previous feature was tested, ruling out
the possibility that the previous feature affects the one to be tested.
Functions properly regardless of whether the device restarts. A configuration rollback
point remains after a device restarts. If any change is made after the restart, the system
automatically generates a non-user-triggered configuration rollback point and saves it.
Users can determine whether to roll system configurations back to what they were before
the device restarts.
Usage Scenario
Deploying unverified new services directly on live network devices may affect the current
services or even disconnect devices from the network management system (NMS). To address
Issue 01 (2018-05-04) 56
NE20E-S2
Feature Description 1 Feature Description
this problem, you can deploy configuration trial run. Configuration trial run will roll back the
system to the latest rollback point by discarding the new service configuration if the new
services threaten system security or disconnect devices from the NMS. This function
improves system security and reliability.
Principles
Configuration trial run takes effect only in two-phase configuration validation mode.
Issue 01 (2018-05-04) 57
NE20E-S2
Feature Description 1 Feature Description
Related Version
The following table lists the product version related to this document.
Intended Audience
This document is intended for:
Network planning engineers
Commissioning engineers
Data configuration engineers
System maintenance engineers
Security Declaration
Encryption algorithm declaration
The encryption algorithms DES/3DES/SKIPJACK/RC2/RSA (RSA-1024 or
lower)/MD2/MD4/MD5 (in digital signature scenarios and password encryption)/SHA1
(in digital signature scenarios) have a low security, which may bring security risks. If
protocols allowed, using more secure encryption algorithms, such as AES/RSA
(RSA-2048 or higher)/SHA2/HMAC-SHA2 is recommended.
Password configuration declaration
− Do not set both the start and end characters of a password to "%^%#". This causes
the password to be displayed directly in the configuration file.
− To further improve device security, periodically change the password.
Personal data declaration
Your purchased products, services, or features may use users' some personal data during
service operation or fault locating. You must define user privacy policies in compliance
with local laws and take proper measures to fully protect personal data.
Feature declaration
− The NetStream feature may be used to analyze the communication information of
terminal customers for network traffic statistics and management purposes. Before
enabling the NetStream feature, ensure that it is performed within the boundaries
Issue 01 (2018-05-04) 58
NE20E-S2
Feature Description 1 Feature Description
Special Declaration
This document serves only as a guide. The content is written based on device
information gathered under lab conditions. The content provided by this document is
intended to be taken as general guidance, and does not cover all scenarios. The content
provided by this document may be different from the information on user device
interfaces due to factors such as version upgrades and differences in device models,
board restrictions, and configuration files. The actual user device information takes
precedence over the content provided by this document. The preceding differences are
beyond the scope of this document.
The maximum values provided in this document are obtained in specific lab
environments (for example, only a certain type of board or protocol is configured on a
tested device). The actually obtained maximum values may be different from the
maximum values provided in this document due to factors such as differences in
hardware configurations and carried services.
Interface numbers used in this document are examples. Use the existing interface
numbers on devices for configuration.
The pictures of hardware in this document are for reference only.
Symbol Conventions
The symbols that may be found in this document are defined as follows.
Symbol Description
Indicates an imminently hazardous situation which, if not
avoided, will result in death or serious injury.
Issue 01 (2018-05-04) 59
NE20E-S2
Feature Description 1 Feature Description
Symbol Description
Indicates a potentially hazardous situation which, if not
avoided, may result in minor or moderate injury.
Change History
Updates between document issues are cumulative. Therefore, the latest document issue
contains all updates made in previous issues.
Changes in Issue 03 (2017-09-20)
This issue is the third official release. The software version of this issue is
V800R009C10SPC200.
Changes in Issue 02 (2017-07-30)
This issue is the second official release. The software version of this issue is
V800R009C10SPC100.
Changes in Issue 01 (2017-05-30)
This issue is the first official release. The software version of this issue is
V800R009C10.
1.4.2 VS
1.4.2.1 Introduction
Definition
A network administrator divides a physical system (PS) into several virtual systems (VSs)
using hardware and software simulation. Each VS performs independent routing tasks. VSs
share software and hardware resources, including main boards (MBs) and line cards (LCs),
but each interface works only for one VS.
Background
As the demand on various types of network services is growing, network management
becomes more complex. Requirements for service isolation, system security, and reliability
are steadily increasing. The virtual private network (VPN) technique can be used to isolate
services on a PS. If a module failure occurs on the PS, all services configured on the PS will
be interrupted.
Issue 01 (2018-05-04) 60
NE20E-S2
Feature Description 1 Feature Description
To prevent service interruptions, the VS technique is used to partition a PS into several VSs.
Each VS functions as an independent network element and uses separate physical resources to
isolate services.
Further development of the distributed routing and switching systems allows VS technique to
fully utilize the service processing capability of a single PS. The VS technique helps simplify
network deployment and management, and strengthen system security and reliability.
Benefits
This feature offers the following benefits to carriers:
Service integrity: Each VS has all the functions of a common router to carry services.
Each VS has an independent control plane, which allows rapid response to future
network services and makes network services more configurable and manageable.
Service isolation: A VS is a virtual router on both the software and hardware planes. A
software or hardware fault in a VS does not affect other VSs. The VS technique ensures
network security and stability.
Expenditure reduction: As an important feature of new-generation IP bearer devices, VSs
play an active role in centralized operation of service provider (SP) services, reducing
capital expenditure (CAPEX) or operational expenditure (OPEX).
1.4.2.2 Principles
1.4.2.2.1 Basic VS Concepts
Concepts
Admin-VS and common VS
A virtual system (VS) is classified as an Admin-VS or a common VS.
Common VS: The network administrator uses hardware-level and software-level emulation to
partition a physical system (PS) into VSs. Each interface works only for one VS, and each VS
runs individual routing tasks. VSs share software and hardware resources, including main
boards (MBs) and line cards (LCs).
Admin-VS: Each PS has a default VS named Admin-VS. All unallocated interfaces belong to
this VS. The Admin-VS can process services in the same way as a common VS. In addition,
the PS administrator can use the Admin-VS to manage VSs.
Services are isolated between different VSs, but configuration and log files are not. This mode
applies when low requirements for VS independency and security are needed.
After you create a VS, allocate hardware resources to the VS.
In Figure 1-43, a PS is partitioned into VSs. Each VS functions as an individual router to
process services.
Issue 01 (2018-05-04) 61
NE20E-S2
Feature Description 1 Feature Description
Implementation
VSs share all resources except interfaces.
VSs can use the same system software to carry various services. In Figure 1-43, VS1 carries
voice services, VS2 carries data services, and VS3 carries video services. Each type of service
is transmitted through a separate VS, and these services are isolated from one other.
VSs can be configured on both physical or logical interfaces, and an interface can only be assigned
to a single VS. Logical interface configured on a physical interface work for the same VS in the PS
to which the LC belongs.
VS partitioning does not require a PS, and a PS must have sufficient interfaces on which VSs can
be configured.
VS Authority Management
Table 1-3 shows VS authority management.
Issue 01 (2018-05-04) 62
NE20E-S2
Feature Description 1 Feature Description
PS administrator √ √
VS administrator - -
A VS administrator can perform operations only on the managed VS, including starting and stopping
the allocated services, configuring routes, forwarding service data, and maintaining and managing
the VS.
On the NE20E, physical interfaces can be directly connected so that different VSs on the same
Physical System (PS) can communicate.
1.4.2.3 Applications
Virtual system (VS) applications are as follows:
Different routing instances are isolated, which is more secure and reliable than route
isolation implemented using VPN.
Physical resources of a device can be fully utilized. For example, without the VS
technique, on a device with 16 interfaces, if only 4 interfaces are needed to transmit
services, the other 12 interfaces idle, wasting resources.
Devices of different roles are integrated to simplify network tests, deployment,
management, and maintenance.
Links between devices are simplified into internal buses that are of higher reliability,
higher performance, and lower cost.
Issue 01 (2018-05-04) 63
NE20E-S2
Feature Description 1 Feature Description
In Figure 1-45, a physical device can serve as both aggregation and core nodes (such as the
BRAS, PE, and P), which simplifies network topology and network management and
maintenance.
Issue 01 (2018-05-04) 64
NE20E-S2
Feature Description 1 Feature Description
Issue 01 (2018-05-04) 65
NE20E-S2
Feature Description 1 Feature Description
Issue 01 (2018-05-04) 66
NE20E-S2
Feature Description 1 Feature Description
Issue 01 (2018-05-04) 67
NE20E-S2
Feature Description 1 Feature Description
Issue 01 (2018-05-04) 68
NE20E-S2
Feature Description 1 Feature Description
Definition
Information management classifies output information, effectively filters out information, and
outputs information to a local device or a remote server.
Purpose
The information management function helps users:
Locate faults effectively.
Classify and filter out information.
Send information to a network management station (workstation) to help a network
administrator monitor routers and locate faults.
1.4.3.2 Principles
1.4.3.2.1 Information Classification
Table 1-4 describes information that can be classified as logs, traps, or debugging information
based on contents, users, and usage scenarios.
Type Description
Logs Logs are records of events and unexpected activities of managed
objects. Logging is an important method to maintain operations and
identifying faults. Logs provide information for fault analysis and helps
an administrator trace user activities, manage system security, and
maintain a system.
Some logs are used by technical support personnel for troubleshooting
only. Because such logs have no practical instruction significance to
users, users are not notified when the logs are generated. Logs are
classified as user logs or diagnostic logs.
User logs: During device running, the log module in the host
software records all running information in logs. The logs are saved
in the log buffer, sent to the Syslog server, reported to the NMS, and
displayed on the screen. Such logs are user logs. Users can view the
compressed log files and content.
Diagnostic logs: The logs recorded after the device is started but
Issue 01 (2018-05-04) 69
NE20E-S2
Feature Description 1 Feature Description
Type Description
before the logserver component starts are diagnostic logs. Such logs
are recorded in the process-side black box, and they are not saved in
the log buffer, sent to the Syslog server, reported to the NMS, or
displayed on the screen. Users can view the compressed log files
and content.
NOTE
The information recorded in diagnostic logs is used for troubleshooting only and
does not contain any sensitive information.
Traps Traps are sent to a workstation to report urgent and important events,
such as the restart of a managed device. In general, the system also
generates a log with the same content after generating a trap, except that
the trap contains an additional OID.
Debugging Debugging information shows the device's the running status, such as
information the sending or receiving of data packets. A device generates debugging
information only after debugging services are enabled.
Issue 01 (2018-05-04) 70
NE20E-S2
Feature Description 1 Feature Description
Overview
Identifying fault information is difficult if there is a large amount of information. Setting
information levels allows users to rapidly identify information.
Information Levels
Table 1-6 describes eight information severities. The lower the severity value, the higher the
severity.
Logs can be output or filtered based on a specified severity value. A device can output logs
with severity values less than or equal to the specified value. For example, if the log severity
value is set to 6, the device only outputs logs with severity values 0 to 6.
Issue 01 (2018-05-04) 71
NE20E-S2
Feature Description 1 Feature Description
Issue 01 (2018-05-04) 72
NE20E-S2
Feature Description 1 Feature Description
Issue 01 (2018-05-04) 73
NE20E-S2
Feature Description 1 Feature Description
Issue 01 (2018-05-04) 74
NE20E-S2
Feature Description 1 Feature Description
1.4.3.3 Applications
1.4.3.3.1 Monitoring Network Operations Using Collected Information
Information can be collected and used to monitor network operations. The collected
information includes active trap messages, historical trap messages, key events, operation
information, and historical performance data.
Command lines can be used to query collected information on a device. The information can
also be sent to a specified terminal or syslog server using the Syslog protocol.
This feature can only be executed on physical systems (PSs) but can take effect on all virtual systems
(VSs).
1.4.4.1 Introduction
Definition
The fault management function is one of five functions (performance management,
configuration management, security management, fault management, and charging
management) that make up a telecommunications management network. The primary
purposes of this function are to monitor the operating anomalies and problems of devices and
networks in real time and to monitor, report, and store data on faults and device running
conditions. Fault management also provides alarms, helping users isolate or rectify faults so
that affected services can be restored.
Issue 01 (2018-05-04) 75
NE20E-S2
Feature Description 1 Feature Description
Purpose
With the popularity of networks, complexity of application environments, and expansion of
network scales, our goal must be to make network management more intelligent and effective.
Improving and optimizing fault management will help us meet this goal. Improved fault
management can achieve the following:
Reduction in the volume of alarms generated
Alarm masking, alarm correlation analysis and suppression, and alarm continuity
analysis functions are supported to provide users with the most direct and valid fault
alarm information and to lighten the load on the fault management system. Such support
for efficient fault location and diagnosis enhances the ability of the network element (NE)
management system to manage same-network NEs and cross-network NEs.
Guaranteed alarm reporting
Use of the active alarm table and internal reliability guarantee mechanism allows alarms
to be displayed immediately so that faults can be rapidly and correctly located and
analyzed.
1.4.4.2 Principles
Alarms are reported if a fault is detected. Classifying, associating, and processing received
alarms help keep you informed of the running status of devices and helps you locate and
analyze faults rapidly.
Table 1-9 lists the alarm functions supported by the HUAWEI.
Function Description
Alarm masking Maintenance engineers can configure alarm
masking on terminals so that terminals
detect only alarms that are not masked. This
function helps users ignore the alarms that
do not need to be displayed.
Alarm suppression Alarm suppression can be classified as jitter
suppression or correlation suppression.
Jitter suppression: uses alarm continuity
analysis to allow the device not to report
the alarm if a fault lasts only a short
period of time and to display a stable
alarm generated if a fault flaps.
Correlation suppression: uses alarm
correlation rules to reduce the number of
reported alarms, reducing the network
load and facilitating fault locating.
Issue 01 (2018-05-04) 76
NE20E-S2
Feature Description 1 Feature Description
Issue 01 (2018-05-04) 77
NE20E-S2
Feature Description 1 Feature Description
Alarm continuity analysis aims to differentiate events that require analysis and attention from
those that do not and to filter out unstable events.
Continuity analysis measures time after a stable event, such as fault occurrence or fault
rectification, occurs. If the event continues for a specified period of time, an alarm is sent. If
the event is cleared, the event is filtered out and no alarm is sent. If a fault lasts only a short
period of time, it is filtered out and no alarm is reported. Only stable fault information is
displayed when a fault flaps.
Figure 1-53 shows the alarm generated if a fault flaps.
Issue 01 (2018-05-04) 78
NE20E-S2
Feature Description 1 Feature Description
alarm, a correlative alarm or independent alarm. If the alarm needs to be sent to a Simple
Network Management Protocol (SNMP) agent and forwarded to the network management
system (NMS), the system determines whether NMS-based correlative alarm suppression is
configured.
If NMS-based correlative alarm suppression is configured, the system filters out
correlative alarms and reports only root alarms and independent alarms to the NMS.
If NMS-based correlative alarm suppression is not configured, the system reports root
alarms, correlative alarms and independent alarms to the NMS.
Definition
The performance management feature periodically collects performance statistics on a device
to monitor the performance and operating status of the device. This feature allows you to
evaluate, analyze, and predict device performance with current and historical performance
statistics.
Purpose
The performance management feature is essential to device operation and maintenance. This
feature provides current and historical statistics about performance indicators, helping you to
determine the device operating status and providing a reference for you to locate faults and
perform configurations.
Analysis on performance statistics helps you to predict the device performance trend. For
example, by analyzing the peak and valley values of user traffic during a day, you can predict
the network traffic growth trend and speed in the next 30 days or longer.
Performance statistics provide a reference for you to optimize network configuration and
make network capacity expansion decisions.
1.4.5.2 Principles
The performance management feature is implemented using the statistics collection function.
The performance management feature allows you to configure the statistics period, statistics
instances, performance indicators, and interval at which statistics files are generated for a
performance statistics task. The statistics period. The interval at which statistics files are
generated. After a performance statistics task is run, the device collects performance indicator
values within the specified statistics period and calculates statistical values at the end of each
statistics period. The device saves performance statistics as files at specified intervals.
After a performance statistics task is configured, the performance management module starts
to periodically collect performance statistics specified in the task.
The statistics include interface-based or service-based traffic statistics. The statistics items are
as follows:
Traffic volume collected during a statistics period
Traffic rate calculated by dividing the traffic volume collected during a statistics period
by the length of the period
Bandwidth usage of statistics objects
Issue 01 (2018-05-04) 79
NE20E-S2
Feature Description 1 Feature Description
The statistics can be the peak, valley, or average values collected during a statistics period, or
the snapshot values collected at the end of a statistics period. The maximum, minimum,
average, and current values of the ambient temperature are examples of such statistics.
The statistics collection function supports many types of statistics tasks. A statistics task can
be bound to a statistics period and multiple statistics instances.
You can query current and historical performance statistics or clear current performance
statistics using either commands or a network management system (NMS).
1.4.5.3 Applications
If a network device supports the performance management feature, the network management
system (NMS) can deliver performance management tasks to collect and analyze the
performance statistics of the device, as shown in Figure 1-54.
The NMS can convert received performance statistics files to files recognizable to a
third-party NMS and transfer these files to the third-party NMS for processing.
Issue 01 (2018-05-04) 80
NE20E-S2
Feature Description 1 Feature Description
Definition
If the performance of the current system software does not meet requirements, you can
upgrade the system software package or maintain the device to enhance the system
performance. Specific operations involve:
System software upgrade and patch installation
GTL file update
Purpose
You can select a proper operation to upgrade and maintain the device according to the
real-world situation. Application scenarios of these operations are as follows:
Upgrade
− System software upgrade
System software upgrade can optimize device performance, add new features, and
upgrade the current software version.
− Patch installation
Patches are a type of software compatible with system software. They are used to
fix urgent bugs in system software. You can upgrade the system by installing
patches, without having to upgrade the system software.
GTL file update
A GTL file controls all resource and function items that can be used by a device. All
service features that have been configured on devices can be enabled only when a GTL
file is obtained from Huawei. GTL file update does not require software upgrade or
affect existing services.
Benefits
To add new features to a device or optimize device performance, or if the current resource
files (including the system software and GTL file) do not meet requirements, you can choose
to upgrade software, install patches, or update the GTL file as needed.
1.4.6.2 Principles
1.4.6.2.1 Software Management
Background
Software management is a basic feature on a device. It involves various operations, such as
software installation, software upgrade, software version rollback, and patch operations.
Issue 01 (2018-05-04) 81
NE20E-S2
Feature Description 1 Feature Description
Basic Concepts
Software management is a basic feature on a device. It involves various operations, such as
software installation, software upgrade, software version rollback, and patch operations.
Obtain the system software of the latest version and its matching documentation from Huawei.
Before uploading the system software onto a device, ensure that sufficient storage space is available on
the master and slave control boards.
Install or upgrade the system software by following the procedure described in an installation or upgrade
guide released by Huawei.
When you install or upgrade the system software, enable the log and alarm management functions to
record installation or upgrade operations on a device. The recorded information helps diagnose faults if
installation or an upgrade fails.
Software installation
A device can load software onto all main control boards simultaneously, which
minimizes the loading time.
Software upgrade
Software can be upgraded to satisfy network and service requirements on a live network.
Software version rollback
If the target software fails to satisfy service requirements or transmit services, software
can be rolled back to the source version.
Patch operations
Installing the latest patch optimizes software capabilities and fixes software bugs.
Installing the latest patch also dynamically upgrades software on a running device, which
minimizes negative impact on services and improves communication quality.
Digital signature for a software package
The digital signature mechanism checks validity and integrity of software packages to
ensure that the software installed on a device is secure and reliable.
After a software package is released, it has security risks in the transfer, download,
storage and installation phases, such as components being replaced or tampered with. A
digital signature is packed into a software package before it is released and validated
before the software package is loaded to a device. The software package is considered
Issue 01 (2018-05-04) 82
NE20E-S2
Feature Description 1 Feature Description
complete and reliable for further installation and use only after the verification on the
digital signature succeeds.
Digital signatures are verified when you set the next-startup patch or system software
package, or load a patch.
Background
Communication devices consist of multiple embedded computer systems, where software may
be vulnerable to viruses and modified by attackers and even attacked by Trojan horses and
unauthorized programs. Once a system is being attacked, the attacker may modify
configurations or intercept packets to tamper with or intercept data.
The trusted computing function allows the system to discover trusted status issues in time,
thereby improving system security and reliability.
Related Concepts
Trusted system: A trusted system indicates that system hardware and software are running
properly as designed. The prerequisite for a trusted system is that the system software
integrity is good without being intruded or tampered with.
Basic Principles
Trusted computing uses the chip and initial startup code of the trusted platform module (TPM)
to establish a trust root for the trusted computing platform.
During startup, the system establishes a complete trust chain from the trust root, BIOS,
bootloader, OS Kernel, to applications, with each level measuring the boot phase of the next
level. The measurement results are irrevocably saved to the TPM chip. This implementation
ensures:
Setup and transmission of the trust chain.
Recording of the system's trusted status.
Benefit
This feature offers the following security benefits:
Trusted start
Provides software integrity measurement, setup and transmission of trusted links, and
records the trusted status of the system.
Trusted status query
Provides query of the trusted status of the system.
Trusted status alarm
Generates an alarm if the trusted status of the system is abnormal.
Issue 01 (2018-05-04) 83
NE20E-S2
Feature Description 1 Feature Description
Software Upgrade
At present, the NE20E supports two types of software upgrade: software upgrade that takes
effect at the next startup .
Software upgrade that takes effect at the next startup
A new name is specified for the system software of the target version. After the device is
restarted, the system automatically uses the new system software. In this manner, the
device is upgraded.
Background
The system software of a running device may need to be upgraded to correct existing errors or
add new functions to meet service requirements. The traditional way is to disconnect the
device from the network and upgrade the system software in offline mode. This method is
service-affecting.
Patches are specifically designed to upgrade the system software of a running device with
minimum or no impact on services.
Basic Concepts
A patch is an independent software unit used to upgrade system software.
Patches are classified as follows based on loading modes:
Incremental patch: A device can have multiple incremental patches installed. The latest
incremental patch contains all the information of previous incremental patches.
Non-incremental patch: A device can have only one non-incremental patch installed. If
you want to install an additional patch for a device on which a non-incremental patch
exists, uninstall the non-incremental patch first.
Patches are classified as follows according to how they take effect:
Hot patch: The patch takes effect immediately after it is installed. Installing a hot patch
does not affect services.
Cold patch: The patch does not take effect immediately after it is installed. You must
reset the corresponding board or subcard or perform a master/slave main control board
switchover for the patch to take effect. Installing a cold patch affects services.
Issue 01 (2018-05-04) 84
NE20E-S2
Feature Description 1 Feature Description
Principles
Patches have the following functions:
Correct errors in the source version without interrupting services running on a device.
Add new functions, which requires one or more existing functions in the current system
to be replaced.
Patches are a type of software compatible with the router system software. They are used to
fix urgent bugs in the router system software.
Table 1-11 shows the patch status supported by a device.
None The patch has been saved to the When a patch is loaded to the
storage medium of the device, but patch area in the memory, the
is not loaded to the patch area in patch status is set to Running.
the memory.
Running The patch is loaded to the patch A patch in the running state can be
area and enabled permanently. If uninstalled and deleted from the
Issue 01 (2018-05-04) 85
NE20E-S2
Feature Description 1 Feature Description
Figure 1-55 shows the relationships between the tasks related to patch installation.
In previous versions, after a cold patch is installed, the system instructs users to perform
operations for the patch to take effect. To facilitate patch installation, the system is configured
to automatically perform the operation that needs to be performed for an installed cold patch
to take effect. Before the system performs the operation, the system asks for your
confirmation.
The implementation principles are as follows:
1. When a cold patch is released, its type and impact range are specified in the patch
description.
2. After a cold patch is installed, the system determines which operation to perform based
on the patch description. For example, the system determines whether to reset a board or
subcard based on the impact range of the cold patch. Then, the system displays a
message asking you to confirm whether to perform the operation for the cold patch to
take effect. The system automatically executes corresponding operations based on users'
choices.
Benefits
Patches allow you to optimize the system performance of a device with minimum or no
impact on services.
GTL License
NE20E software is subject to a license which specifies features, versions, capacities, and
expiration for the product. It also grants the right of software usage rather than the ownership
Issue 01 (2018-05-04) 86
NE20E-S2
Feature Description 1 Feature Description
of software. After purchasing a license, a customer has specified rights and receives a license
certificate from the vendor.
A GTL helps carriers reduce OPEX and speed up service deployment. This is true because all
features are deployed on each device, and a license can be purchased for required features. If
the needs of the carriers change, they may purchase more licenses, making licenses more
flexible in current and future business solutions. Another benefit is that the rights to use
features can be obtained without upgrading the device software or affecting running services.
A license file is encrypted using the device's sequence number as the key. A license can be
obtained from the license management server.
1.4.6.3 Applications
1.4.6.3.1 Upgrade Software
Software Upgrade
If the performance of the current system software does not meet requirements, you can update
the system software package to enhance system performance.
There are two methods to obtain a system software package: remote download or local
download. For details on how to obtain a system software package, refer to the configuration
guide of the corresponding product.
Patch Upgrade
During device operation, the system software may need to be modified due to system bugs or
new function requirements. The traditional way is to upgrade the system software after
powering off the device. This, however, interrupts services and affects QoS.
Loading a patch into the system software achieves system software upgrade without
interrupting services on the device and improves QoS.
Terms
None
Issue 01 (2018-05-04) 87
NE20E-S2
Feature Description 1 Feature Description
1.4.7 SNMP
1.4.7.1 Introduction
Definition
Simple Network Management Protocol (SNMP) is a network management standard widely
used on TCP/IP networks. With SNMP, a core device, such as a network management station
(workstation), running network management software manage network elements (NEs), such
as routers.
SNMP provides the following functions:
A workstation uses GET, Get-Next, and Get-Bulk operations to obtain network resource
information.
A workstation uses a SET operation to set management Information Base (MIB) objects.
A management agent proactively reports traps and informs to notify the workstation of
network status (allowing network administrators to take real-time measures as needed.)
Purpose
SNMP is primarily used to manage networks.
There are two types of network management methods:
Network management issues related to software, including application management,
simultaneous file access by users, and read/write access permissions. This guide does not
describe software management in detail.
Management of NEs that make up a network, such as workstations, servers, network
interface cards (NICs), routers, bridges, and hubs. Many of these devices are located far
from the central network site where the network administrator is located. Ideally, a
network administrator should be automatically notified of faults anywhere on the
network. Unlike users, however, routers cannot pick up the phone and call the network
administrator when there is a fault.
To address this problem, some manufacturers produce devices with integrated network
management functions. The workstation can remotely query the device status, and the devices
can use alarms to inform the workstation of events.
Network management involves the following items:
Managed objects: devices, also called NEs, to be monitored
Agent: special software or firmware used to trace the status of managed objects
Workstation: a core device used to communicate with agents about managed objects and
to display the status of these agents
Network management protocol: a protocol run on the workstation and agents to
exchange information
Issue 01 (2018-05-04) 88
NE20E-S2
Feature Description 1 Feature Description
Feature Description
Access control This function restricts a user's device administration rights. It
gives a user the rights to manage specific objects on devices and
therefore provides refined management.
Authentication and This function authenticates and encrypts packets transmitted
encryption between an NMS and a managed device. This function prevents
data packets from being intercepted or tampered with,
improving data transmission security.
Error code Error codes help a network administrator identify and resolve
device faults. A wide range of error codes makes it easier for a
network administrator to manage devices.
Trap Traps are sent from a managed device to an NMS. Traps notify
a network administrator of device faults.
A managed device does not require an acknowledgement from
the NMS after it sends a trap.
Inform Informs are sent from a managed device to an NMS. Informs
notify a network administrator of device faults.
A managed device requires an acknowledgement from the NMS
after it sends an inform. If a managed device does not receive an
acknowledgement after it sends an inform, the managed device
performs the following operations:
Resends the inform to the NMS.
Stores the inform in the inform buffer, which consumes a lot
of system resources.
Generates a log.
NOTE
After an NMS restarts, it learns of the informs sent during the restart
process.
Issue 01 (2018-05-04) 89
NE20E-S2
Feature Description 1 Feature Description
Encryption mode:
Data Encryption
Standard-56
(DES-56)
3DES168
Advanced
Encryption
Standard-128
(AES128)
AES192
AES256
NOTE
To ensure high
security, do not use
the DES-56 or
3DES168 algorithm
as the SNMPv3
encryption algorithm.
Issue 01 (2018-05-04) 90
NE20E-S2
Feature Description 1 Feature Description
1.4.7.2 Principles
1.4.7.2.1 SNMP Principles
Figure 1-56 shows a typical Simple Network Management Protocol (SNMP) management
system. The entire system must have a network management station (workstation) that
functions as a network management center for the network and runs management processes.
Each managed object must have an agent process. Management processes and agent processes
use User Datagram Protocol (UDP) to transmit SNMP messages for communication.
Issue 01 (2018-05-04) 91
NE20E-S2
Feature Description 1 Feature Description
A workstation running SNMP cannot manage NEs (managed objects) running a network
management protocol, not SNMP. In this situation, the workstation must use proxy agents for
management. A proxy agent provides functions, such as protocol transition and filtering
operations. Figure 1-57 shows how a proxy agent works.
Issue 01 (2018-05-04) 92
NE20E-S2
Feature Description 1 Feature Description
By default, an agent uses port 161 to receive Get, Get-Next, and Set messages, and the workstation uses
port 162 to receive traps.
Issue 01 (2018-05-04) 93
NE20E-S2
Feature Description 1 Feature Description
An SNMP message consists of a common SNMP header, a Get/Set header, a trap header, and
variable binding.
Get/Set Header
The Get or Set header contains the following fields:
Request ID
Issue 01 (2018-05-04) 94
NE20E-S2
Feature Description 1 Feature Description
Error index
When noSuchName, badValue, and readOnly errors occur, the agent sets an integer in
the Response message to specify an offset value for the faulty variable in the list. By
default, the offset value in get-request messages is 0.
Variable binding (variable-bindings)
A variable binding specifies the variable name and corresponding value, which is empty
in Get or Get-Next messages.
Trap Header
Enterprise
This field is an object identifier of a network device that sends traps. The object
identifier resides in the sub-tree of the enterprise object {1.3.6.1.4.1} in the object
naming tree.
Generic trap type
Table 1-16 lists the generic trap types that can be received by SNMP.
Issue 01 (2018-05-04) 95
NE20E-S2
Feature Description 1 Feature Description
To send a type 2, 3, or 5 trap, you must use the first variable in the trap's variable binding field
to identify the interface responding to the trap.
Specific-code
If an agent sends a type 6 trap, the value in the Specific-code field specifies an event
defined by the agent. If the trap type is not 6, this field value is 0.
Timestamp
This specifies the duration from when an agent is initializing to when an event reported
by a trap occurs. This value is expressed in 10 ms. For example, a timestamp of 1908
means that an event occurred 19080 ms after initialization of the agent.
Variable Binding
Variable binding specifies the name and value of one or more variables. In Get or Get-Next
messages, this field is null.
Issue 01 (2018-05-04) 96
NE20E-S2
Feature Description 1 Feature Description
In 1996, the Internet Engineering Task Force (IETF) issued a series of SNMP-associated
standards. These documents defined SNMPv2c and abandoned the security standard in
SNMPv2.
SNMPv2c enhances the following aspects of SNMPv1:
Structure of management information (SMI)
Communication between workstations
Protocol control
SNMPv2c Security
SNMPv2c abandons SNMPv2 security improvements and inherits the message mechanism
and community concepts in SNMPv1.
Issue 01 (2018-05-04) 97
NE20E-S2
Feature Description 1 Feature Description
the security parameters in the PDU header and then send the unpacked PDU to the dispatcher
for processing.
1.4.7.2.6 MIB
A Management Information Base (MIB) specifies variables (MIB object identifiers or OIDs)
maintained by NEs. These variables can be queried and set in the management process. A
MIB provides a structure that contains data on all NEs that may be managed on the network.
The SNMP MIB uses a hierarchical tree structure similar to the Domain Name System (DNS),
beginning with a nameless root at the top. Figure 1-60 shows an object naming tree, one part
of the MIB.
Issue 01 (2018-05-04) 98
NE20E-S2
Feature Description 1 Feature Description
The three objects at the top of the object naming tree are: ISO, ITU-T (formerly CCITT), and
the sum of ISO and ITU-T. There are four objects under ISO. Of these, the number 3
identifies an organization. A Department of Defense (DoD) sub-tree, marked dod (6), is under
the identified organization (3). Under dod (6) is internet (1). If the only objects being
considered are Internet objects, you may begin drawing the sub-tree below the Internet object
(the square frames in dotted lines with shadow marks in the following diagram), and place the
identifier {1.3.6.1} next to the Internet object.
One of the objects under the Internet object is mgmt (2). The object under mgmt (2) is mib-2
(1) (formerly renamed in the new edition MIB-II defined in 1991). mib-2 is identified by an
OID, {1.3.6.1.2.1} or {Internet(1).2.1}.
Issue 01 (2018-05-04) 99
NE20E-S2
Feature Description 1 Feature Description
1.4.7.2.7 SMI
Structure of Management Information (SMI) is a set of rules used to name and define
managed objects. It can define the ID, type, access level, and status of managed objects. At
present, there are two SMI versions: SMIv1 and SMIv2.
The following standard data types are defined in SMI:
INTEGER
OCTER STRING
DisplayString
OBJECT IDENTIFIER
NULL
IpAddress
PhysAddress
Counter
Gauge
TimeTicks
SEQUENCE
SEQUENCE OF
1.4.7.2.8 Trap
A managed device sends unsolicited trap messages to notify a network management system
(NMS) that an urgent and significant event has occurred on the managed device. For example,
the managed device restarts. Figure 1-61 shows the process of transmitting a trap message.
If the trap triggering conditions defined for the agent's module are met, the agent sends a trap
message to notify the NMS that a significant event has occurred. Network administrators can
promptly handle the event.
The NMS uses port 162 to receive trap messages from the agent. The trap messages are
carried over the User Datagram Protocol (UDP). After the NMS receive trap messages, it does
not need to acknowledge the messages.
Background
The Simple Network Management Protocol (SNMP) communicates management information
between a network management station (NMS) and a device, such as a router, so that the
NMS can manage the device. If the NMS and device use different SNMP versions, the NMS
cannot manage the device.
To resolve this problem, configure SNMP proxy on a device between the NMS and device to
be managed, as shown in Figure 1-62. In the following description, the device on which
SNMP proxy needs to be configured is referred to as a middle-point device.
The NMS manages the middle-point device and managed device as an independent network
element, reducing the number of managed network elements and management costs.
Principles
In Figure 1-63, the middle-point device allows you to manage the network access,
configurations, and system software version of the managed device. The network element
management information base (MIB) files loaded to the NMS include the MIB tables of both
the middle-point device and managed device. After you configure SNMP proxy on the
middle-point device, the middle-point device automatically forwards SNMP requests from the
NMS to the managed device and forwards SNMP responses from the managed device to the
NMS.
The process in which an NMS uses a middle-point device to query the MIB information
of a managed device is as follows:
a. The NMS sends an SNMP request that contains the MIB object ID of the managed
device to the middle-point device.
The engine ID carried in an SNMPv3 request must be the same as the engine
ID of the SNMP agent on the managed device.
If the SNMP request is an SNMPv1 or SNMPv2c packet, a proxy community
name must be configured on the middle-point device with the engine ID of the
SNMP agent on the managed device be specified. The community name
carried in the SNMP request packet must match the community name
configured on the managed device.
b. Upon receipt, the middle-point device searches its proxy table for a forwarding
entry based on the engine ID.
If a matching forwarding entry exists, the middle-point device caches the
request and encapsulates the request based on forwarding rules.
If no matching forwarding entry exists, the middle-point device drops the
request.
c. The middle-point device forwards the encapsulated request to the managed device
and waits for a response.
d. After the middle-point device receives a response from the managed device, the
middle-point device forwards the response to the NMS.
If the middle-point device fails to receive a response within a specified period, the
middle-point device drops the SNMP request.
The process in which a managed device uses a middle-point device to send a notification
to an NMS is as follows:
a. The managed device generates a notification due to causes such as overheating and
sends the notification to the middle-point device.
b. Upon receipt, the middle-point device searches its proxy table for a forwarding
entry based on the engine ID.
Background
AAA is an authentication, authorization, and accounting technique. AAA local users can be
configured to log in to a device through FTP, Telnet, or SSH. However, SNMPv3 supports
only SNMP users, which can be an inconvenience in unified network device management.
To resolve this issue, configure SNMP to support AAA users. AAA users can then access the
NMS, and MIB node operation authorization can be performed based on tasks. The NMS
does not distinguish AAA users and SNMP users.
Figure 1-65 shows the process of an AAA user logging in to the NMS through SNMP.
Figure 1-65 Process of an AAA user logging in to the NMS through SNMP
Principles
Figure 1-66 shows the principles of SNMP's support for AAA users.
1. Create a local AAA user.
If the AAA user needs to log in through SNMP, the user name must have less than 32
characters.
2. Configure the AAA user to log in through SNMP.
3. SNMP synchronizes the AAA user data and updates the SNMP user list. Configure a
mode to authenticate the AAA user and a mode to encrypt the AAA user's data.
The AAA user's authentication and encryption modes are SNMP. An authentication
password is not used.
After the preceding operations are performed, the AAA user can log in to the NMS in the
same way as an SNMP user.
1.4.7.3 Applications
1.4.7.3.1 Monitoring an Outdoor Cabinet Using SNMP Proxy
As shown in Figure 1-68, a Simple Network Management Protocol (SNMP) proxy and the
cabinet control unit (CCU) of a managed device are placed in an outdoor cabinet. The SNMP
proxy enables communication between the network management station (NMS) and managed
device and allows you to manage the configurations and system software version of the
managed device.
Figure 1-68 Networking diagram for monitoring an outdoor cabinet using SNMP proxy
The SNMP proxy is deployed on the main device. The NMS manages each cabinet as a virtual
unit that consists of the main device and monitoring device. This significantly reduces the
number of NEs managed by the NMS, lowering network management costs, facilitating
real-time device performance monitoring, and improving service quality.
1.4.8 NETCONF
1.4.8.1 Introduction
Definition
The Network Configuration Protocol (NETCONF) is an extensible markup language (XML)
based network configuration and management protocol. NETCONF uses a simple remote
procedure call (RPC) mechanism to implement communication between a client and a server.
NETCONF provides a method for a network management system (NMS) to remotely manage
and monitor devices.
Purpose
As networks grow in scale and complexity, the Simple Network Management Protocol
(SNMP) can no longer meet carriers' network management requirements, especially
configuration management requirements. XML-based NETCONF was developed to overcome
this limitation.
Table 1-19 lists the differences between SNMP and NETCONF.
Benefits
NETCONF offers the following benefits:
Facilitates configuration data management and interoperability between different
vendors' devices using XML encoding to define messages and the RPC mechanism to
modify configuration data.
Reduces network faults caused by manual configuration errors.
Improves the efficiency of system software upgrade performed using a configuration
tool.
Provides high extensibility, allowing different vendors to define additional NETCONF
operations.
Improves data security using authentication and authorization mechanisms.
1.4.8.2 Principles
1.4.8.2.1 NETCONF Protocol Framework
Like the open systems interconnection (OSI) model, the NETCONF protocol framework also
uses a hierarchical structure. A lower layer provides services for the upper layer.
The hierarchical structure enables each layer to focus only on a single aspect of NETCONF
and reduces the dependencies between different layers.
Table 1-20 describes the layers of the NETCONF protocol framework.
NOTE
Currently, the NE20E can use only SSH as the transport protocol for
NETCONF.
Layer <rpc> and The RPC layer provides a simple and transport-independent
2: <rpc-reply> framing mechanism for encoding RPCs. The NETCONF manager
remote uses the <rpc> element to encapsulate RPC request information
proced and sends the RPC request information to the NETCONF agent
ure call over a secure and connection-oriented session. The NETCONF
(RPC) agent uses the <rpc-reply> element to encapsulate RPC response
information (content at the operations and content layers) and
sends the RPC response information to the NETCONF manager.
In normal cases, the <rpc-reply> element encapsulates the data
required by the NETCONF manager or information about a
configuration success. If the NETCONF manager sends an
incorrect request or the NETCONF agent fails to process a
request from the NETCONF manager, the NETCONF agent
encapsulates the <rpc-error> element containing detailed error
information in the <rpc-reply> element and sends the <rpc-reply>
element to the NETCONF manager.
Layer <get-config The operation layer defines a series of basic operations used in
3: >, RPC. The basic operations constitute basic NETCONF
Although Schema and YANG have different model description syntax, they have almost
the same XML messages exchanged between the device and NMS when using the same
model.
Related Concepts
XML encoding
XML is a NETCONF encoding format, allowing complex hierarchical data to be
expressed in a text format that can be read, saved, and manipulated with both traditional
text tools and tools specific to XML. All NETCONF protocol elements are defined in
namespace urn:ietf:params:xml:ns:netconf:base:1.0. Document type declarations must
not be contained in NETCONF content.
XML-based network management uses XML to describe managed data and management
operations, so that devices can parse management information.
XML-based network management has the following advantages:
− Strong data presentation capabilities
− Easy, efficient, and secure large-scale data transmission
− Improved network configuration management
Remote procedure call (RPC) model
NETCONF uses an RPC-based communication model. NETCONF uses XML-encoded
<rpc> and <rpc-reply> elements to provide transport-protocol-independent framing of
NETCONF requests and responses. Table 1-21 describes some commonly used RPC
elements. If a module supports YANG, its capability must provide information about the
YANG model. An example message is as follows:
<capability>http://www.huawei.com/netconf/vrp/huawei-ifm?module=huawei-ifm&
revision=2013-01-01</capability>
Elements Description
<rpc> Encapsulates a request that the client sends to the server.
<rpc-reply> Encapsulates a response message for an <rpc> request message. The
server returns a response message, which is encapsulated in the
<rpc-reply>element, for each <rpc> request message.
<rpc-error> Notifies a client of an error occurring during <rpc> request processing.
The server encapsulates the <rpc-error> element in the <rpc-reply>
element and sends the <rpc-reply> element to the client.
<ok> Notifies a client of no errors occurring during <rpc> request processing.
The server encapsulates the <ok> element in the <rpc-reply> element and
sends the <rpc-reply> element to the client.
NETCONF capability
A NETCONF capability is a set of functionalities that supplement the base NETCONF
specification. Capabilities augment the base operations of a device, describing both
additional operations and the content-allowed internal operations.
The client can discover the server's capabilities and use any additional operations,
parameters, and content defined by those capabilities.
A capability is identified by a uniform resource identifier (URI).
urn:ietf:params:xml:ns:netconf:capability:{name}:1.0
In addition to the capabilities defined by NETCONF, a vendor can define additional
capabilities to extend management functions.
Configuration databases
A configuration database is a complete collection of configuration parameters for a
device. Table 1-22 describes NETCONF-defined configuration databases.
Configur Description
ation
Database
must have the writable-running capability.
<candidate Stores various configuration parameters that are about to run on a device.
/> An administrator can perform operations on the <candidate/> configuration
database. Modifications to this configuration database do not take effect
before data in the <running/> configuration database is replaced with data in
this configuration database.
To support the <candidate/> configuration database, the device must have the
candidate capability.
NOTE
The <candidate/> configuration databases supported by Huawei devices do not allow
inter-session data sharing. Therefore, the configuration of a <candidate/> configuration
database does not require additional locking operations.
<startup/> Stores a configuration data set loaded during device startup. The
configuration data set is similar to a configuration file.
To support the <startup/> configuration database, the device must have the
distinct startup capability.
Subtree filtering
Subtree filtering allows an application to include particular XML subtrees in the
<rpc-reply> elements for a <get> or <get-config> operation.
Subtree filtering provides a small set of filters for inclusion, simple content exact-match,
and selection. The server does not need to use any data-model-specific semantics during
processing, allowing for simple and centralized implementation policies.
Table 1-23 describes subtree filter components.
Compone Description
nt
Namespac If namespaces are used, then the filter output will include only elements from
e selection the specified namespace.
Containme A containment node is a node that contains child elements within a subtree
nt node filter.
For each containment node specified in a subtree filter, all data model
instances which exactly match the specified namespaces and element
hierarchy are included in the filter output.
Content A content match node is a leaf node which contains simple content within a
match subtree filter.
node A content match node is used to select some or all of its sibling nodes for
Compone Description
nt
filter output and represents an exact-match filter of the leaf node element
content.
Selection A selection node is an empty leaf node within a subtree filter.
node A selection node represents an explicit selection filter of the underlying data
model. Presence of any selection nodes within a set of sibling nodes will
cause the filter to select the specified subtrees and suppress automatic
selection of the entire set of sibling nodes in the underlying data model.
Overview
The NETCONF authorization mechanism regulates user permissions to perform NETCONF
operations and access NETCONF resources.
NETCONF authorization includes:
NETCONF operation authorization: authorizes user information by specific NETCONF
operations, such as <edit-config>, <get>, <sync-full>, <sync-inc>, and <commit>.
Module authorization: authorizes users for specific feature modules, such as Telnet-client,
Layer 3 virtual private network (L3VPN), Open Shortest Path First (OSPF), Fault-MGR,
Device-MGR, and Intermediate System-to-Intermediate System (IS-IS).
Data node authorization: authorizes users for specific data nodes, such as:
/ifm/interfaces/interface/ifAdminStatus/devm/globalPara/maxChassisNum.
The authorization rules for NETCONF operations and data nodes can be configured.
Principles
The NETCONF authorization mechanism is similar to the task authorization mechanism used
to regulate command authorization. NETCONF authorization is also modeled based on
NETCONF access control model (ACM).
1.19.2.21.36 User Group-based and Task Group-based User Management defines tasks, task
groups, and user groups. The task authorization mechanism uses a three-layer permission
control model. This model organizes commands into tasks, tasks into task groups, and task
groups into user groups.
The NETCONF authorization mechanism is based on the task authorization mechanism. The
NETCONF authorization mechanism subscribes to required authorization information from
the task authorization mechanism and stores the obtained information in its local data
structures.
NETCONF authorization rules include the operation rule and data node rule. Rule
permissions can be either permit or deny, which is specified in the user-group view.
Only permit is allowed in the task-group view. For a schema path, access permission can be
set to read, write, or execute.
Benefits
NETCONF authorization is a mechanism to restrict access for particular users to a
pre-configured subset of all available NETCONF protocol operations and contents.
Capabilities Exchange
When a NETCONF session is opened, each NETCONF peer sends a <hello> element
containing a list of its capabilities. If both peers support a capability, they can implement
special management functions based on this capability.
Each NETCONF peer sends its <hello> element simultaneously as soon as the connection is
opened. A NETCONF peer does not wait to receive the capabilities from the other side before
sending its own capabilities.
After a server exchanges <hello> elements with a client, the agent waits for <rpc> elements
from the client. A server returns an <rpc-reply> element in response to each <rpc> element.
Figure 1-72 shows the interaction between a server and a client.
<capability>urn:ietf:params:netconf:capability:writable-running:1.0</capability
>
<capability>urn:ietf:params:netconf:capability:candidate:1.0</capability>
<capability>urn:ietf:params:netconf:capability:startup:1.0</capability>
<capability>urn:ietf:params:netconf:capability:rollback-on-error:1.0</capabilit
y>
<capability>http://www.huawei.com/netconf/capability/sync/1.0</capability>
<capability>http://www.huawei.com/netconf/capability/rollback/1.0</capability>
<capability>http://www.huawei.com/netconf/capability/exchange/1.0</capability>
<capability>http://www.huawei.com/netconf/capability/active/1.0</capability>
<capability>http://www.huawei.com/netconf/capability/action/1.0</capability>
<capability>http://www.huawei.com/netconf/capability/update/1.0</capability>
</capabilities>
</hello>
Example of a <hello> element sent by a server
− Example of a <hello> message sent by a YANG-supported module
<?xml version="1.0" encoding="UTF-8"?>
<hello xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<capabilities>
<capability>urn:ietf:params:netconf:base:1.0</capability>
<capability>urn:ietf:params:netconf:capability:writable-running:1.0</capab
ility>
<capability>urn:ietf:params:netconf:capability:candidate:1.0</capability>
<capability>urn:ietf:params:netconf:capability:validate:1.0</capability>
<capability>urn:ietf:params:netconf:capability:startup:1.0</capability>
<capability>urn:ietf:params:netconf:capability:rollback-on-error:1.0</capa
bility>
<capability>urn:huawei:netconf:capability:sync:1.0</capability>
<capability>urn:ietf:params:netconf:capability:confirmed-commit:1.0</capab
ility>
<capability>http://www.huawei.com/netconf/capability/sync/1.0</capability>
<capability>http://www.huawei.com/netconf/capability/discard-commit/1.0</c
apability>
<capability>http://www.huawei.com/netconf/capability/rollback/1.0</capabil
ity>
<capability>http://www.huawei.com/netconf/capability/exchange/1.0</capabil
ity>
<capability>http://www.huawei.com/netconf/capability/action/1.0</capabilit
y>
<capability>http://www.huawei.com/netconf/capability/update/1.0</capabilit
y>
</capabilities>
<session-id>1149</session-id>
</hello>
− Example of a <hello> message sent by a YANG-supported module, with a
notification containing YANG model information at the end of the message
<capability>http://www.huawei.com/netconf/vrp/huawei-ifm?module=huawei-ifm
&revision=2013-01-01</capability>
<capability>http://www.huawei.com/netconf/vrp/huawei-pub-types?module=
huawei-pub-types &revision=2013-01-01</capability>
Examples of invalid <hello> elements sent by a client
− Example of a <hello> element that does not contain base capability information
<?xml version="1.0">
<hello xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<capabilities>
</capabilities>
</hello>
− Example of an incorrect <hello> element
<?xml version="1.0">
<capabilities>
<capabilities>urn:ietf:params:netconf:base:1.0</capability>
<capabilities>urn:ietf:params:netconf:capability:candidate:1.0</capability
>
</capabilities>
</hello>
The base NETCONF capabilities support the <running/> configuration database. The
following describes the operations defined in base capabilities:
<get-config>
The <get-config> operation retrieves all or part of a specified configuration from the
<running/>, <candidate/>, and <startup/> configuration databases. The <get-config>
operation can also retrieve configuration from file: <url>huawei.cfg</url>.
If the <get-config> operation is successful, the server sends an <rpc-reply> element
containing a <data> element with the results of the query. Otherwise, the server sends an
<rpc-reply> element containing an <rpc-error> element.
− <rpc> element
<rpc message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<get-config>
<source>
<running/>
</source>
<filter type="subtree">
<ifm xmlns="http://www.huawei.com/netconf/vrp"
content-version="1.0" format-version="1.0">
<interfaces>
<interface/>
</interfaces>
</ifm>
</filter>
</get-config>
</rpc>
− <rpc-reply> element
<rpc-reply message-id="101"
xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<data>
<ifm xmlns="http://www.huawei.com/netconf/vrp" format-version="1.0"
content-version="1.0">
<interfaces>
<interface>
<ifIndex>2</ifIndex>
<ifName>Virtual-Template0</ifName>
<ifPhyType>Virtual-Template</ifPhyType>
<ifPosition>
</ifPosition>
<ifParentIfName>
</ifParentIfName>
<ifNumber>0</ifNumber>
<!-- additional <interface> elements appear here... -->
</interface>
</interfaces>
</ifm>
</data>
</rpc-reply>
<get>
The <get> operation retrieves configuration and state data from the <running/>
configuration database.
If the <get> operation is successful, the server sends an <rpc-reply> element containing a
<data> element with the results of the query. Otherwise, the server sends an <rpc-reply>
element containing an <rpc-error> element.
The <get> operation can retrieve data from the <running/>, <candidate/>, and <startup/>
configuration databases, whereas the <get> operation can retrieve data only from the <running/>
configuration database. The <source> parameter does not need to be specified in the <rpc> element
for a <get> operation.
The <get-config> operation can retrieve only configuration data, whereas the <get> operation can
retrieve both configuration and state data.
− <rpc> element
<rpc message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<get>
<filter type="subtree">
<ifm xmlns="http://www.huawei.com/netconf/vrp" content-version="1.0"
format-version="1.0">
<interfaces>
<interface>
<ifName>GigabitEthernet0/0/0</ifName>
</interface>
</interfaces>
</ifm>
</filter>
</get>
</rpc>
− <rpc-reply> element
<rpc-reply message-id="101"
xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<data>
<ifm xmlns="http://www.huawei.com/netconf/vrp" format-version="1.0"
content-version="1.0">
<interfaces>
<interface>
<ifIndex>5</ifIndex>
<ifName>GigabitEthernet0/0/0</ifName>
<ifPhyType>MEth</ifPhyType>
<!-- additional <interface> elements appear here... -->
</interface>
</interfaces>
</ifm>
</data>
</rpc-reply>
<edit-config>
The <edit-config> operation creates, modifies, or deletes configuration data.
If the <edit-config> operation is successful, the server sends an <rpc-reply> element
containing an <ok> element. Otherwise, the server sends an <rpc-reply> element
containing an <rpc-error> element.
− <rpc> element
<rpc message-id="60" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<edit-config>
<target>
<running/>
</target>
<default-operation>merge</default-operation>
<error-option>rollback-on-error</error-option>
<config>
<ifm xmlns="http://www.huawei.com/netconf/vrp" content-version="1.0"
format-version="1.0">
<interfaces>
<interface>
<ifName>GigabitEthernet0/0/0</ifName>
<ifDescr>chenyuqiao</ifDescr>
</interface>
</interfaces>
</ifm>
</config>
</edit-config>
</rpc>
− <rpc-reply> element
<rpc-reply message-id="60" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0"
flow-id="104">
<ok />
</rpc-reply>
RPC request:
<rpc message-id="2415" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<get>
<filter type="subtree">
<arp xmlns="http://www.huawei.com/netconf/vrp/huawei-arp"/>
</filter>
</get>
</rpc>
RPC reply:
<?xml version="1.0" encoding="UTF-8"?>
<rpc-reply message-id="2415" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<data>
<arp xmlns="http://www.huawei.com/netconf/vrp/huawei-arp>
<arpSystemInfo>
<learnStrictEnable>false</learnStrictEnable>
<l2TopoDetectEnable>false</l2TopoDetectEnable>
<rateTrapInterval>0</rateTrapInterval>
<arpPassiveLearnEnable>false</arpPassiveLearnEnable>
<arpTopoDetectDisable>false</arpTopoDetectDisable>
</arpSystemInfo>
</arp>
</data>
</rpc-reply>
<copy-config>
The <copy-config> operation creates or replaces an entire configuration database with
the content of another complete configuration database. If the target database exists, it is
overwritten. Otherwise, a new one is created, if allowed.
If the <copy-config> operation is successful, the server sends an <rpc-reply> element
containing an <ok> element. Otherwise, the server sends an <rpc-reply> element
containing an <rpc-error> element.
− <rpc> element
<rpc message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<copy-config>
<target>
<url>eee.cfg</url>
</target>
<source>
<running/>
</source>
</copy-config>
</rpc>
− <rpc-reply> element
<rpc-reply message-id="101"
xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<ok/>
</rpc-reply>
<delete-config>
The <delete-config> operation deletes a configuration database, not the <running/>
configuration database.
If the <delete-config> operation is successful, the server sends an <rpc-reply> element
containing an <ok> element. Otherwise, the server sends an <rpc-reply> element
containing an <rpc-error> element.
− <rpc> element
<rpc message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<delete-config>
<target>
<startup/>
</target>
</delete-config>
</rpc>
− <rpc-reply> element
<rpc-reply message-id="101"
xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<ok/>
</rpc-reply
<lock>
The <lock> operation locks the configuration databases of a device. A locked
configuration database cannot be modified by other clients. The locks eliminate errors
caused by simultaneous database modifications by the client and other clients or Simple
Network Management Protocol (SNMP) or command-line interface (CLI) scripts.
If the <lock> operation is successful, the server sends an <rpc-reply> element containing
an <ok> element. Otherwise, the server sends an <rpc-reply> element containing an
<rpc-error> element.
If the specified configuration database is already locked by a client, the <error-tag>
element will be "lock-denied" and the <error-info> element will include the <session-id>
of the lock owner.
If the <lock> operation is successful:
− <rpc> element
<rpc message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<lock>
<target>
<running/>
</target>
</lock>
</rpc>
− <rpc-reply> element
<rpc-reply message-id="101"
xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<ok/> <!-- lock succeeded -->
</rpc-reply>
If the <lock> operation is unsuccessful:
− <rpc> element
<rpc message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<lock>
<target>
<running/>
</target>
</lock>
</rpc>
− <rpc-reply> element
<rpc-reply message-id="101"
xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<rpc-error>
<error-type>protocol</error-type>
<error-tag>lock-denied</error-tag>
<error-severity>error</error-severity>
<error-app-tag>43</error-app-tag>
<error-message>The configuration is locked by other user. [Session ID =
629]</error-message>
<error-info>
<session-id>629</session-id>
<error-paras>
<error-para>629</error-para>
</error-paras>
</error-info>
</rpc-error>
</rpc-reply>
<unlock>
The <unlock> operation releases a configuration lock previously obtained with the
<lock> operation. A client cannot unlock a configuration database that it did not lock.
If the <unlock> operation is successful, the server sends an <rpc-reply> element
containing an <ok> element. Otherwise, the server sends an <rpc-reply> element
containing an <rpc-error> element.
− <rpc> element
<rpc message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<unlock>
<target>
<running/>
</target>
</unlock>
</rpc>
− <rpc-reply> element
<rpc-reply message-id="101"
xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<ok/>
</rpc-reply>
<close-session>
The <close-session> operation gracefully terminates a NETCONF session.
If the <close-session> operation is successful, the server sends an <rpc-reply> element
containing an <ok> element. Otherwise, the server sends an <rpc-reply> element
containing an <rpc-error> element.
− <rpc> element
<rpc message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<close-session/>
</rpc>
− <rpc-reply> element
<rpc-reply message-id="101"
xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<ok/>
</rpc-reply>
<kill-session>
The <kill-session> operation forcibly terminates a NETCONF session. Only an
administrator is authorized to perform this operation.
If the <kill-session> operation is successful, the server sends an <rpc-reply> element
containing an <ok> element. Otherwise, the server sends an <rpc-reply> element
containing an <rpc-error> element.
− <rpc> element
<rpc message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<kill-session>
<session-id>4</session-id>
</kill-session>
</rpc>
− <rpc-reply> element
<rpc-reply message-id="101"
xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<ok/>
</rpc-reply>
If the Confirmed Commit capability is used in a specified interval, all configured data
are performed and converted to the running configuration data on the device. If the
Confirmed Commit capability is used when the interval elapses, the configured data are
not performed and restored to the original configuration. The interval can be configured
using the <confirm-timeout> parameter.
<capability> urn:ietf:params:netconf:capability:confirmed-commit:1.0
</capability>
− RPC request
<rpc message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<commit>
<confirmed/>
<confirm-timeout>120</confirm-timeout>
</commit>
</rpc>
− RPC reply
<rpc-reply message-id="101"
xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<ok/>
</rpc-reply>
Rollback on error
This capability indicates that the device can perform a rollback when an error occurs. If
an error occurs and the <rpc-error> element is generated, the server stops performing the
<edit-config> operation and restores the specified configuration to the status before the
<edit-config> operation is performed.
<capability> urn:ietf:params:netconf:capability:rollback-on-error:1.0
</capability>
Distinct startup
This capability indicates that the device can perform a distinct startup. The server checks
parameter availability and consistency during the distinct startup of a device.
The server supports the <startup/> configuration database and can distinguish the <running/>
configuration database from the <startup/> configuration database. To permanently save configuration
data in the <running/> configuration database, perform the <copy-config> operation to copy the
configuration data from the <running/> configuration database to the <startup/> configuration database.
<target>
<flow-id>10</flow-id>
</target>
<source>
<flow-id>1</flow-id>
<source/>
</sync-increment>
</rpc>
<rpc-reply> element
<rpc-reply message-id="102"
xmlns="urn:ietf:params:xml:ns:netconf:base:1.0"102hwcontext="vr=1">
<data>
<ifm>
<interfacs>
<interface difference="create">
<interfaceName>ethernet 1/1/1.1</interfaceName>
<mtu>15000</mtu>
<adminStatus>down</adminStatus>
</interface>
<interface difference="delete">
<interfaceName>ethernet 1/1/2.1</interfaceName>
</interface>
<interface difference="modify">
<interfaceName>ethernet 1/1/3</interfaceName>
<mtu>15000</mtu>
<adminStatus>up</adminStatus>
</interface>
<interface difference="modify">
<interfaceName>ethernet 1/1/4</interfaceName>
<ifAm4s>
<ifAm4 difference="create">
<ipAddress>10.164.11.10</ipAddress >
<netMask>255.255.255.0</netMask>
<addressType></addressType>
</ifAm4>
</ifAm4s>
</interface>
</ifm>
</data>
</rpc-reply>
Active notification
This capability indicates that the device can inform its peer that it is active.
<capability> http://www.huawei.com/netconf/capability/active/1.0 </capability>
For operations that take a long time, such as the <commit> and <copy-config>
operations, the client may consider their request processing as overtime and cancel these
operations.
The active notification capability enables the server to periodically send an <active>
element to the client when processing a time-consuming <rpc> element, informing the
client that the <rpc> element is under processing.
Only a device with the active notification capability supports <active> elements.
<active message-id="101"
xmlns="http://www.huawei.com/netconf/capability/base/1.0"> </active>
Action
This capability indicates that the device can perform maintenance operations.
In addition to basic operations, such as additions, deletions, modifications, and queries, a
device also needs to perform some maintenance operations.
The action capability enables a device to perform maintenance operations using <rpc>
elements. The operation results are returned in <rpc-reply> elements. Maintenance
operations usually do not involve configuration data modification or service data
obtaining. A maintenance operation may be deleting a packet counter or resetting a
board.
<capability> http://www.huawei.com/netconf/capability/action/1.0 </capability>
Only a device with the action capability supports the <execute-action> operation.
The <execute-action> operation requests the server to run a maintenance command
(excluding query and basic configuration commands).
If the <execute-action> operation is successful, the server sends an <rpc-reply> element
containing an <ok> element. Otherwise, the server sends an <rpc-reply> element
containing an <rpc-error> element.
− <rpc> element
<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="631">
<execute-action
xmlns="http://www.huawei.com/netconf/capability/base/1.0">
<action>
<acl xmlns="http://www.huawei.com/netconf/vrp" content-version="1.0"
format-version="1.0">
<aclResetCount>
<aclNumOrName>2111</aclNumOrName>
</aclResetCount>
</acl>
</action>
</execute-action>
</rpc>
− <rpc-reply> element
<rpc-reply message-id="631"
xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<ok/>
</rpc-reply>
Execute CLI
This capability indicates that the device can interact with the request sender in request
processing.
<capability> http://www.huawei.com/netconf/capability/execute-cli/1.0
</capability>
Only a device with the exchange capability supports the <execute-cli> operation.
The client performs the <execute-cli> operation to execute CLI commands through
NETCONF. Maximum 60 commands are allowed in single <rpc> request. Maximum
command string length allowed is 512 bytes. Execute-cli operation behaves as stop on
error. For example, if any failure in command execution, then it does not execute the rest
of commands in <rpc> request.
Update
This capability indicates that the device can update configuration data.
<capability> http://www.huawei.com/netconf/capability/update/1.0 </capability>
The <update> operation updates the configuration data in the <candidate/> configuration
database with the latest configuration data in the <running/> configuration database
when a conflict occurs during data commitment.
If the <update> operation is successful, the server sends an <rpc-reply> element
containing an <ok> element. Otherwise, the server sends an <rpc-reply> element
containing an <rpc-error> element.
Only a device with the update capability supports the <update> operation.
− <rpc> element
<rpc message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<update xmlns="http://www.huawei.com/netconf/capability/base/1.0"/>
</rpc>
− <rpc-reply> element
<rpc-reply message-id="101"
xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<ok/>
</rpc-reply>
Exchange
This capability indicates that the device can interact with the request sender in request
processing.
<capability> http://www.huawei.com/netconf/capability/exchange/1.0 </capability>
<capability> http://www.huawei.com/netconf/capability/exchange/1.2 </capability>
Only a device with the exchange capability supports the <get-next> operation.
The client performs the <get-next> operation to interact with the server.
For example, if the returned result of a <get> or <get-config> operation involves a large
amount of data, the server has to return the data using multiple <rpc-reply> elements.
After the client receives the first <rpc-reply> element, the client interacts with the server
to request for the next <rpc-reply> element or cancel the data query.
If the <get-next> operation for obtaining the next <rpc-reply> element is successful, the
server sends an <rpc-reply> element containing a <data> element. Otherwise, the server
sends an <rpc-reply> element containing an <rpc-error> element.
− <rpc> element
<rpc message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<get>
<filter type="subtree">
<ifm xmlns="http://www.huawei.com/netconf/vrp" content-version="1.0"
format-version="1.0">
<interfaces>
<interface>
</interface>
</interfaces>
</ifm>
</filter>
</get>
</rpc>
− <rpc-reply> element
<?xml version="1.0" encoding="UTF-8"?>
<rpc-reply message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0"
set-id="989">
<data>
<ifm xmlns="http://www.huawei.com/netconf/vrp" format-version="1.0"
content-version="1.0">
<interfaces>
<interface>
<ifIndex>2</ifIndex>
<ifName>Virtual-Template0</ifName>
<ifPhyType>Virtual-Template</ifPhyType>
<!-- additional <user> elements appear here... -->
</interface>
</interfaces>
</ifm>
</data>
</rpc-reply>
If the <get-next> operation for canceling the data query is successful, the server sends an
<rpc-reply> element containing an <ok> element. Otherwise, the server sends an
<rpc-reply> element containing an <rpc-error> element.
− <rpc> element
<rpc message-id="103" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<get-next xmlns="http://www.huawei.com/netconf/capability/base/1.0"
set-id="1">
<discard/>
</get-next>
</rpc>
− <rpc-reply> element
<rpc-reply message-id="103"
xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<ok/>
</rpc-reply>
− Commit-description
This capability indicates that a description of the committed data can be configured
when the data in the <candidate/> configuration database is committed to the
<running/> configuration database.
<capability>
http://www.huawei.com/netconf/capability/commit-description/1.0
</capability>
− Discard Commit
This operation is used to cancel/abort an ongoing confirmed-commit operation. For
example, before confirmed commit timeout or confirming <commit> operation.
<capability> http://www.huawei.com/netconf/capability/discard-commit/1.0
</capability>
1.4.8.3 Applications
1.4.8.3.1 NETCONF-based Configuration and Management
Devices on a network are usually located in various regions, as shown in Figure 1-73.
Configuring and managing these devices at each site is difficult. In addition, if these devices
are manufactured by various vendors, and each vendor provides a unique set of device
management methods, configuring and managing these devices using traditional methods will
be costly and highly ineffective. To resolve these issues, use NETCONF to remotely
configure, manage, and monitor devices.
You can use the Simple Network Management Protocol (SNMP) as an alternative to remotely configure,
manage, and monitor devices on a simple network.
1.4.9 DCN
1.4.9.1 Introduction
Definition
The data communication network (DCN) refers to the network on which network elements
(NEs) exchange Operation, Administration and Maintenance (OAM) information with the
network management system (NMS). It is constructed for communication between managing
and managed devices.
A DCN can be an external or internal DCN. In Figure 1-74, an external DCN is between the
NMS and an access point, and an internal DCN allows NEs to exchange OAM information
within it. In this document, internal DCNs are described.
Gateway network elements (GNEs) are connected to the NMS using protocols, for example,
the Simple Network Management Protocol (SNMP). GNEs are able to forward data at the
network or application layer. An NMS directly communicates with a GNE and uses the GNE
to deliver management information to non-GNEs.
Purpose
When constructing a large network, hardware engineers must install devices on site, and
software commissioning engineers must configure the devices also on site. This network
construction method requires significant human and material resources, causing high capital
expenditure (CAPEX) and operational expenditure (OPEX). If a new NE is deployed but the
NMS cannot detect the NE, the network administrator cannot manage or control the NE.
Plug-and-play can be used so that the NMS can automatically detect new NEs and remotely
commission the NEs to reduce CAPEX and OPEX.
The DCN technique offers a mechanism to implement plug-and-play. After an NE is installed
and started, an IP address (NEIP address) mapped to the NEID of the NE is automatically
generated. Each NE adds its NEID and NEIP address to a link state advertisement (LSA).
Then, Open Shortest Path First (OSPF) advertises all Type-10 LSAs to construct a core
routing table that contains mappings between NEIP addresses and NEIDs on each NE. After
detecting a new NE, the GNE reports the NE to the NMS. The NMS accesses the NE using
the IP address of the GNE and ID of the NE. To commission NEs, the NMS can use the GNE
to remotely manage the NEs on the network.
To improve the system security, it is recommended that the NEIP address be changed to the planned one.
Benefits
The NMS is able to manage NEs using service channels provided by the managed NEs. No
additional devices are required, reducing CAPEX and OPEX.
1.4.9.2 Principles
1.4.9.2.1 Basic Concepts
To improve the system security, it is recommended that the NEIP address be changed to the planned one.
The devices on a data communication network (DCN) communicate with each other using the
Point-to-Point Protocol (PPP) through single-hop logical channels. Therefore, packets
transmitted on the DCN are encapsulated into PPP frames and forwarded through service
ports at the data link layer.
As shown in Figure 1-75, the NMS uses a GNE to manage non-GNEs in the following
process:
1. After the DCN function is enabled, a PPP channel and an OSPF neighbor relationship
are established between devices.
2. OSPF LSAs are sent between OSPF neighbors to learn host routes carrying NEIP
addresses to obtain mappings between NEIP addresses and NEIDs.
3. GNE sends the mappings to NMS, The NMS then accesses non-GNEs.
A core routing table is generated in the following process:
1. After PPP Network Control Protocol (NCP) negotiation is complete, a point-to-point
route is generated without network segment restrictions.
2. The OSPF neighbor relationship is established, and the OSPF route is generated for the
entire network.
3. NEIDs are advertised using OSPF LSAs, triggering the generation of a core routing
table.
1.4.9.3 Applications
DCN Application
During network deployment, every network element (NE) must be configured with software
and commissioned after hardware installation to ensure that all NEs can communicate with
each other. As a large number of NEs are deployed, on-site deployment for each NE requires
significant manpower and is time-consuming. In order to reduce the on-site deployment times
and the cost of operation and maintenance, the DCN can be deployed.
In Figure 1-76, to improve reliability, active and standby GNEs can be deployed. If the active
GNE fails, the NMS can gracefully switch this function to the standby GNE.
1. A DCN VLAN group is configured on the GNE, and the VLAN ID of the Dot1q
termination subinterface is the same as the DCN VLAN ID of the main interface.
2. The GNE sends DCN negotiation packets to VLANs in the DCN VLAN group.
3. The DCN negotiation packets are sent to different leaf nodes through VLLs.
4. NEs learn the DCN VLAN ID sent by the GNE and establish DCN connections with the
GNE.
Terms
Term Description
GNE Gateway network elements (GNEs) are able to forward
data at the network or application layer. The NMS can
use GNEs to manage remote NEs connected through
optical fibers.
Core routing table A core routing table consists of mappings between NEID
and NEIP addresses of NEs on a data communication
network (DCN). Before accessing a non-GNE through a
GNE, the NMS must search the core routing table for the
NEIP address of the non-GNE based on the destination
NEID.
1.4.10 LAD
1.4.10.1 Introduction
Definition
Link Automatic Discovery (LAD) is a Huawei proprietary protocol that discovers neighbors
at the link layer. LAD allows a device to issue link discovery requests as triggered by the
NMS or command lines. After the device receives link discovery replies, the device generates
neighbor information and saves it in the local MIB. The NMS can then query neighbor
information in the MIB and generate the topology of the entire network.
Purpose
Large-scale networks demand increased NMS capabilities, such as obtaining the topology
status of connected devices automatically and detecting configuration conflicts between
devices. Currently, most NMSs use an automated discovery function to trace changes in the
network topology but can only analyze the network-layer topology. Network-layer topology
information notifies you of basic events like the addition or deletion of devices, but gives you
no information about the interfaces used by one device to connect to other devices or the
location or network operation mode of a device.
LAD is developed to resolve these problems. LAD can identify the interfaces on a network
device and provide detailed information about connections between devices. LAD can also
display paths between clients, switches, routers, application servers, and network servers. The
detailed information provided by LAD can help efficiently locate network faults.
Benefits
LAD helps network administrators promptly obtain detailed network topology and changes in
the topology and monitor the network status in real time, improving security and stability for
network communication.
1.4.10.2 Principles
1.4.10.2.1 Basic Concepts
When Ethernet sub-interfaces are used on links, LAD packets are encapsulated into
Ethernet frames. Figure 1-79 shows the LAD packet format on Ethernet sub-interfaces.
Tag 4 bytes 2-byte Ethernet Type field and 2-byte VLAN field included
Type 2 bytes Packet type, fixed at 0x0806
Field 6 bytes Four fields included:
Hardware Type, fixed at 0xFF-FF
When low-speed interfaces are used on links, LAD packets are encapsulated into PPP
frames. Figure 1-80 shows the LAD packet format on low-speed interfaces.
The Information field is the same in all the three LAD packet formats, meaning that the LAD
data units are irrelevant to the link type. Figure 1-81 shows the format of the LAD data unit.
Link Reply packets: link discovery replies in response to the Link Detect packets sent by
remote devices. Link Reply packets carry the Send Link Info SubTLV (the same as that
in the received Link Detect packets) and Recv Link Info SubTLV. Figure 1-83 shows the
format of the Link Reply packet data unit.
1.4.10.2.2 Implementation
Background
To monitor the network status in real-time and to obtain detailed network topology and
changes in the topology, network administrators usually deploy the Link Layer Discovery
Protocol (LLDP) on live networks. LLDP, however, has limited applications due to the
following characteristics:
LLDP uniquely identifies a device by its IP address. IP addresses are expressed in dotted
decimal notation and therefore are not easy to maintain or manage, when compared with
NE IDs that are expressed in decimal integers.
LLDP is not supported on Ethernet sub-interfaces, Eth-Trunk interfaces, or low-speed
interfaces, and therefore cannot discover neighbors for these types of interfaces.
LLDP-enabled devices periodically broadcast LLDP packets, consuming many system
resources and even affecting the transmission of user services.
Link Automatic Discovery (LAD) addresses the preceding problems and is more flexible:
LAD uniquely identifies a device by an NE ID in decimal integers, which are easier to
maintain and manage.
LAD can discover neighbors for various types of interfaces and therefore are more
widely used than LLDP.
LAD is triggered by an NMS or command lines and therefore can be implemented as
you need.
Implementation
The following example uses the networking in Figure 1-84 to illustrate how LAD is
implemented.
Local and remote devices exchange LAD packets to learn each other's NE ID, slot ID, subcard ID,
interface number, and even each other's VLAN ID if sub-interfaces are used.
4. The NMS exchanges NETCONF packets with DeviceA to obtain DeviceA's local and
neighbor information and then generates the topology of the entire network.
Benefits
After network administrators deploy LAD on devices, they can obtain information about all
links connected to the devices. LAD helps extend the network management scale. Network
administrators can obtain detailed network topology information and topology changes.
1.4.10.3 Applications
1.4.10.3.1 LAD Application in Single-Neighbor Networking
Networking Description
In single-neighbor networking, devices are directly connected, and each device interface
connects only to one neighbor. In Figure 1-85, DeviceA and DeviceB are directly connected,
and each interface on DeviceA and DeviceB connects only to one neighbor.
Feature Deployment
After enabling Link Automatic Discovery (LAD) on DeviceA, administrators can use the
NMS to obtain Layer 2 configurations of DeviceA and DeviceB, get a detailed network
topology, and determine whether a configuration conflict exists. LAD helps improve security
and stability for network communication.
Networking Description
In multi-neighbor networking, devices are connected over an unknown network, and each
device interface connects to one or more neighbors. In Figure 1-86, DeviceA, DeviceB, and
DeviceC are connected over a Layer 2 virtual private network (L2VPN). Devices on the
L2VPN may have Link Automatic Discovery (LAD) disabled or may not need to be managed
by the NMS, but they can still transparently transmit LAD packets. DeviceA has two
neighbors, DeviceB and DeviceC.
Feature Deployment
After enabling Link Automatic Discovery (LAD) on DeviceA, administrators can use the
NMS to obtain Layer 2 configurations of DeviceA, DeviceB, and DeviceC, get a detailed
network topology, and determine whether a configuration conflict exists. LAD helps ensure
security and stability for network communication.
Networking Description
On the network shown in Figure 1-87, an Eth-Trunk that comprises aggregated links exists
between DeviceA and DeviceB. Each aggregated link interface connects directly to only one
neighbor, as if it were connected in single-neighbor networking.
Feature Deployment
After enabling Link Automatic Discovery (LAD) on DeviceA, administrators can use the
NMS to obtain Layer 2 configurations of DeviceA and DeviceB, get a detailed network
topology, and determine whether a configuration conflict exists. LAD helps ensure security
and stability for network communication.
Terms
Term Definition
LAD A Huawei proprietary protocol that discovers neighbors at the link layer.
LAD allows a device to issue link discovery requests as triggered by the
NMS or command lines. After the device receives link discovery
replies, the device generates neighbor information and saves it in the
local MIB. The NMS can then query neighbor information in the MIB
and generate the topology of the entire network.
LLDP A Layer 2 discovery protocol defined in IEEE 802.1ab. LLDP provides
a standard link-layer discovery mode to encapsulate information about
the capabilities, management address, device ID, and interface ID of a
local device into LLDP packets and send the packets to neighbors. The
neighbors save the information received in a standard MIB to help the
NMS query and determine the communication status of links.
1.4.11 LLDP
1.4.11.1 Introduction
Definition
The Link Layer Discovery Protocol (LLDP), a Layer 2 discovery protocol defined in IEEE
802.1ab, provides a standard link-layer discovery method that encapsulates information about
the capabilities, management address, device ID, and interface ID of a local device into LLDP
packets and sends the packets to neighboring devices. These neighboring devices save the
information received in a standard management information base (MIB) to help the network
management system (NMS) query and determine the link communication status.
Purpose
Diversified network devices are deployed on a network, and configurations of these devices
are complicated. Therefore, NMSs must be able to meet increasing requirements for network
management capabilities, such as the capability to automatically obtain the topology status of
connected devices and the capability to detect configuration conflicts between devices. A
majority of NMSs use an automated discovery function to trace changes in the network
topology, but most can only analyze the network layer topology. Network layer topology
information notifies you of basic events, such as the addition or deletion of devices, but gives
you no information about the interfaces to connect a device to other devices. The NMSs
cannot identify neither the device location or the network operation mode.
LLDP is developed to resolve these problems. LLDP can identify interfaces on a network
device and provide detailed information about connections between devices. LLDP can also
display information about paths between clients, switches, routers, application servers, and
network servers, which helps you efficiently locate network faults.
Benefits
Deploying LLDP improves NMS capabilities. LLDP supplies the NMS with detailed
information about network topology and topology changes, and it detects inappropriate
configurations existing on the network. The information provided by LLDP helps
administrators monitor network status in real time to keep the network secure and stable.
1.4.11.2 Principles
1.4.11.2.1 Basic LLDP Concepts
LLDP Packets
LLDP encapsulates Ethernet packets encapsulated into LLDP data units (LLDPDUs). LLDP
supports two encapsulation modes: Ethernet II and Subnetwork Access Protocol (SNAP). The
versatile routing platform (NE20E) supports the Ethernet II encapsulation mode.
Figure 1-88 shows the format of an Ethernet II LLDP packet.
Field Description
Destination MAC A fixed multicast MAC address 0x0180-C200-000E.
address
Source MAC address A MAC address for an interface or a bridge MAC address for a
device.
Type Packet type, fixed at 0x88CC.
LLDPDU Main body of an LLDP packet.
FCS Frame check sequence.
LLDPDU
An LLDPDU is a data unit encapsulated in the data field in an LLDP packet.
A device encapsulates local device information in type-length-value (TLV) format and
combines several TLVs in an LLDPDU for transmission. You can combine various TLVs to
form an LLDPDU as required. TLVs allow a device to advertise its own status and learn the
status of neighboring devices.
Figure 1-89 shows the LLDPDU format.
Each LLDPDU carries a maximum of 28 types of TLVs, and that each LLDPDU starts with
the Chassis ID TLV, Port ID TLV, and Time to Live TLV, and ends with the End of LLDPDU
TLV.
TLV
A TLV is the smallest unit of an LLDPDU. It gives type, length, and other information for a
device object. For example, a device ID is carried in the Chassis ID TLV, an interface ID in
the Port ID TLV, and a network management address in the Management Address TLV.
LLDPDUs can encapsulate basic TLVs, TLVs defined by IEEE 802.1 working groups, TLVs
defined by IEEE 802.3 working groups, and Data Center Bridging Capabilities Exchange
Protocol (DCBX) TLVs.
Basic TLVs: are the basis for network device management.
Organizationally specific TLVs: include TLVs defined by IEEE 802.1 and those defined
by IEEE 802.3. They are used to enhance network device management. Use these TLVs
as needed.
a. TLVs defined by IEEE 802.1
The device searches the IP address list for loopback interfaces, management network interface, and
VLANIF interfaces in sequence and automatically selects the smallest IP address of the same interface
type as a management IP address.
Implementation
LLDP must be used together with MIBs. LLDP requires that each device interface be
provided with four MIBs. An LLDP local system MIB that stores status information of a local
device and an LLDP remote system MIB that stores status information of neighboring devices
are the most important. The status information includes the device ID, interface ID, system
name, system description, interface description, device capability, and network management
address.
LLDP requires that each device interface be provided with an LLDP agent to manage LLDP
operations. The LLDP agent performs the following functions:
Maintains information in the LLDP local system MIB.
Sends LLDP packets to notify neighboring devices of local device status.
Identifies and processes LLDP packets sent by neighboring devices and maintains
information in the LLDP remote system MIB.
Sends LLDP alarms to the NMS when detecting changes in information stored in the
LLDP local and remove system MIBs.
Working Mechanism
LLDP working modes
LLDP is working in one of the following modes:
Tx mode: enables a device only to send LLDP packets.
Rx mode: enables a device only to receive LLDP packets.
Tx/Rx mode: enables a device to send and receive LLDP packets. The default working
mode is Tx/Rx.
Disabled mode: disables a device from sending or receiving LLDP packets.
When the LLDP working mode changes on an interface, the interface initializes the LLDP state
machines. To prevent repeatedly initializations caused by frequent working mode changes, the NE20E
supports an initial delay on the interface. When the working mode changes on the interface, the interface
initializes the LLDP state machines after a configured delay interval elapses.
1.4.11.3 Applications
1.4.11.3.1 LLDP Applications in Single Neighbor Networking
Networking Description
In single neighbor networking, no interfaces between devices or interfaces between devices
and media endpoints (MEs) are directly connected to intermediate devices. Each device
interface is connected only to one remote neighboring device. In the single neighbor
networking shown in Figure 1-92, Device B is directly connected to Device A and the ME,
and each interface of Device A and Device B is connected only to a single remote neighboring
device.
Feature Deployment
After LLDP is configured on Device A and Device B, an administrator can use the NMS to
obtain Layer 2 configuration information about these devices, collect detailed network
topology information, and determine whether a configuration conflict exists. LLDP helps
make network communications more secure and stable.
Networking Description
In multi-neighbor networking, each interface is connected to multiple remote neighboring
devices. In the multi-neighbor networking shown in Figure 1-93, the network connected to
Device A, Device B, and Device C is unknown. Devices on this unknown network may have
LLDP disabled or may not need to be managed by the NMS, but they can still transparently
transmit LLDP packets. Interfaces on Device A, Device B, and Device C are connected to
multiple remote neighboring devices.
Feature Deployment
After LLDP is configured on Device A, Device B, and Device C, an administrator can use the
NMS to obtain Layer 2 configuration information about these devices, collect detailed
network topology information, and determine whether a configuration conflict exists. LLDP
helps make network communications more secure and stable.
Networking Description
In Figure 1-94, aggregated links exist between interfaces on Device A and Device B. Each
aggregated link interface is connected directly to another aggregated link interface in the same
way in single neighbor networking.
Feature Deployment
After LLDP is configured on Device A and Device B, an administrator can use the NMS to
obtain Layer 2 configuration information about these devices, collect detailed network
topology information, and determine whether a configuration conflict exists. LLDP helps
make network communications more secure and stable.
Terms
Term Description
1.4.12.1 Introduction
Definition
Synchronization is classified into the following types:
Clock synchronization, also called frequency synchronization
Clock synchronization maintains a strict relationship between signal frequencies or
between signal phases. Signals are transmitted at the same average rate within the valid
time. In this manner, all devices on a network run at the same rate.
On a digital communication network, a sender places a pulse signal in a specific timeslot
for transmission. A receiver needs to extract this pulse signal from this specific timeslot
to ensure that the sender and receiver communicate properly. The clocks on the sender
and receiver must also be synchronized to ensure smooth communication. Clock
synchronization enables the clocks on the sender and receiver to be synchronized.
Time synchronization, also called phase synchronization
Generally, the word "time" indicates either a moment or a time interval. A moment is a
transient in a period, whereas a time interval is the interval between two transients. Time
synchronization adjusts the internal clocks and moments of devices based on a received
time. The working principle of time synchronization is similar to that of clock
synchronization. When a time is adjusted, both the frequency and phase of a clock are
adjusted. The phase of this clock is represented by a moment in the form of year, month,
day, hour, minute, second, millisecond, microsecond, and nanosecond. Time
synchronization enables devices to receive discontinuous time reference information and
to adjust their times to synchronize times. Clock synchronization enables devices to trace
a clock source to synchronize frequencies.
Figure 1-95 Comparison between the time synchronization and clock synchronization
The figure shows the difference between time synchronization and clock synchronization. In
time synchronization, watches A and B always keep the same time. In clock synchronization,
watches A and B keep different times, but the time difference between the two watches is a
constant value, for example, 6 hours.
Purpose
Clock synchronization aims to limit the clock frequency or phase difference between network
elements (NEs) on a digital communication network within an allowable range. Information is
coded into digital pulse signals using pulse code modulation (PCM) and transmitted on a
digital communication network. If the clock frequencies of two digital switching devices are
different, or digital bit streams are corrupted due to interference during transmission, phase
drift or jitter occurs. Consequently, the buffer of the digital switching system experiences data
loss or duplication, resulting in incorrect transmission of bit streams. If the clock frequency or
phase difference exceeds an allowable range, bit errors or jitter may occur. As a result,
network transmission performance deteriorates.
1.4.12.2 Principles
1.4.12.2.1 Basic Concepts
Clock Source
The device that provides clock signals for another device is called a clock source. A device
can have multiple clock sources. They are classified into the following types:
External clock source
An external clock source traces a high-level clock through the clock interface provided
by a clock board.
Line clock source
You are advised to configure the automatic clock source selection mode. In this mode, the NE20E
dynamically selects an optimal clock source based on clock source quality.
NE20E
SSM
The International Telecommunication Union-Telecommunication Standardization Sector
(ITU-T) defined a synchronous status message (SSM) to identify the quality level of a
synchronization source on synchronous digital hierarchy (SDH) networks. As stipulated by
the ITU-T, the four spare bits in one of the five Sa bytes in a 2 Mbit/s bit stream are used to
transmit the SSM value. The use of the SSM value in clock source selection brings the
following benefits:
Improves synchronization network performance.
Prevents timing loops.
Achieves synchronization on networks with different structures.
Enhances synchronization network reliability.
Extended SSM
The extended SSM function enables clock IDs to participate in automatic clock source
selection. This function prevents clock loops.
When the extended SSM function is enabled, the NE20E does not allow clock IDs to
participate in automatic clock source selection in either of the following cases:
The clock ID of a clock source is the same as the clock ID configured on the NE20E.
The clock ID of a clock source is 0.
Pseudo Synchronization
In pseudo synchronization mode, each switching site has its own clock with very high
accuracy and stability, and clock synchronization is not carried out among the switching sites.
There is a small difference in the clock frequency or phase among the clocks of the switching
sites, which does not affect service transmission and can therefore be ignored. This is the
reason why the mode is called pseudo synchronization.
Pseudo synchronization applies to digital networks between countries. Caesium clocks are
usually used on digital networks inside a country.
Master/Slave Synchronization
In master/slave synchronization mode, a master clock of high accuracy is set on a network
and traced by every site. Every network element traces the higher-level clock in the same site
or sub-site. Every sub-site traces the higher-level clock in its own site.
Master/Slave synchronization is classified into two types: direct master/slave synchronization
and hierarchical master/slave synchronization.
In direct master/slave synchronization mode shown in Figure 1-96, all slave clocks
synchronize with the master clock. Direct master/slave synchronization applies to simple
networks.
In hierarchical master/slave synchronization mode shown in Figure 1-97, the devices are
classified into three levels. The level-2 slave clock synchronizes with the level-1 reference
clock, and the level-3 slave clock synchronizes with the Level-2 slave clock. Hierarchical
master/slave synchronization applies to large complex networks.
Hold-in state
When all reference clocks are lost, the slave clock enters the hold-in state and uses the
last frequency stored before the reference clocks were lost. In addition, the slave clock
provides clock signals that conform to the source reference clock to ensure that there is a
small difference between the frequency of the provided clock signals and that of the
reference clock.
Free running state
After losing all external reference clocks, the slave clock loses the clock reference
memory or retains the hold-in state for a long time. As a result, the oscillator inside the
slave clock works in the free running state.
The accuracy of the clock in the hold-in state cannot be maintained for a long time because of
the drift of the inherent oscillation frequency. Therefore, the accuracy of the clock in the
hold-in state is inferior to that of the clock in the trace state.
As shown in Figure 1-99, router A traces the BITS clock. routers A and B are connected
through an Ethernet link. routers B and C are also connected through an Ethernet link, and
router C traces router B's clock. The clocks of the three routers synchronize with the BITS
clock.
Owing to the long transmission distance of optical fibers, synchronizing clock signals through
synchronous Ethernet links has become the most common networking mode for clock
synchronization.
As shown in Figure 1-100, on Device A that serves as the master clock, the active clock
board is configured to trace, Device B is configured to trace the clock of Device A, and
Device C is configured to trace the clock of Device B.
When all devices on the entire network trace router A's clock, there is no reference clock
on the entire network if router A fails. As a result, all routers do not have an accurate
reference clock. The routers may trace a reference clock, but the reference clock
accuracy cannot meet synchronization requirements.
When the clock board is powered on, the default SSM levels of all reference sources are
Unknown. The sequence of the SSM levels from high to low is PRC, SSUA, SSUB, SEC,
UNKNOWN, and DNU. If the SSM level of a clock source is DNU and the SSM level
participates in the selection of a clock source, the clock source is not selected during
protection switching.
The SSM level of output signals is determined by the traced clock source. When the
clock works in the trace state, the SSM level of output signals and that of the traced
clock source are the same. When the clock does not work in the trace state, the SSM
level of output signals is SEC.
For a line clock source, the SSM can be extracted from an interface board and reported
to the IPU. The IPU then sends the SSM to the clock board. The IPU can also forcibly
set the SSM of the line clock source.
For the BITS clock source of the clock module:
− If the signal is 2.048 Mbit/s, the clock module can extract the SSM from the signal.
− If the signal is 2.048 MHz, the SSM level can be set manually.
The router can only select an SSM value listed in Table 1-32. For values not listed, the router processes
them as DNU.
1.4.12.2.5 Impact
Definition
The 1588 adaptive clock recovery (ACR) algorithm is used to carry out clock (frequency)
synchronization between the NE20E and clock servers by exchanging 1588v2 messages over
a clock link that is set up by sending Layer 3 unicast packets.
Unlike 1588v2 that achieves frequency synchronization only when all devices on a network
support 1588v2, 1588 ACR is capable of implementing frequency synchronization on a
network with both 1588v2-aware devices and 1588v2-unaware devices.
After 1588 ACR is enabled on a server, the server provide 1588 ACR frequency
synchronization services for clients.
1588 ACR records PDV performance statistics in the CF card. The performance statistics indicate the
delay and jitter information about packets but not information in the packets.
Purpose
All-IP has become the trend for future networks and services. Therefore, traditional networks
based on the Synchronous Digital Hierarchy (SDH) have to overcome various constraints
before migrating to IP packet-switched networks. Transmitting Time Division Multiplexing
(TDM) services over IP networks presents a major technological challenge. TDM services are
classified into two types: voice services and clock synchronization services. With the
development of VoIP, technologies of transmitting voice services over an IP network have
become mature and have been extensively used. However, development of technologies of
transmitting clock synchronization services over an IP network is still under way.
1588v2 is a software-based technology that carries out time and frequency synchronization.
To achieve higher accuracy, 1588v2 requires that all devices on a network support 1588v2; if
not, frequency synchronization cannot be achieved.
Derived from 1588v2, 1588 ACR implements frequency synchronization with clock servers
on a network with both 1588v2-aware devices and 1588v2-unaware devices. Therefore, in the
situation where only frequency synchronization is required, 1588 ACR is more applicable
than 1588v2.
Benefits
This feature brings the following benefits to operators:
Frequency synchronization can be achieved on networks with both 1588v2-aware and
1588v2-unaware devices, reducing the costs of network construction.
Operators can provide more services that can meet subscribers' requirements for
frequency synchronization.
This feature brings the following benefits to users:
N/A.
1.4.13.2 Principles
1.4.13.2.1 Basic Principles of 1588 ACR
1588 ACR aims to synchronize frequencies of routers (clients) with those of clock servers
(servers) or router (Client) and clock servers(Server).
1588 ACR sends Layer 3 unicast packets to establish a clock link between a client and a
server to exchange 1588v2 messages. 1588 ACR obtains a clock offset by comparing
timestamps carried in the 1588v2 messages, which enables the client to synchronize
frequencies with the server.
a. The server sends the client 1588v2 messages at t1 and t1' and time-stamps the
messages with t1 and t1'.
b. The client receives the 1588v2 messages at t2 and t2' and time-stamps the messages
with t2 and t2'.
t1 and t1' are the clock time of the server, and t2 and t2' are the clock time of the client.
By comparing the sending time on the server and the receiving time on the client, 1588
ACR calculates a frequency offset between the server and client and then implements
frequency synchronization. For example, if the result of the formula (t2 - t1)/(t2' - t1') is
1, frequencies on the server and client are the same; if not, the frequency of the client
needs to be adjusted so that it is the same as the frequency of the server.
Two-way mode
a. The server clock sends a 1588 sync packet carrying a timestamp t1 to the client
server at t1.
b. The client server receives a 1588 sync packet from the server clock at t2.
c. The client clock sends a 1588 delay_req packet to the server clock at t3.
d. The server clock receives the 1588 delay_req packet from the client clock at t4, and
sends a delay_resp packet to the slave clock.
The same calculation method is used in two-way and one-way modes. t1 and t2 are compared
with t3 and t4. A group of data with less jitter is used for calculation. In the same network
conditions, the clock signals with less jitter in one direction can be traced, which is more
precise than clock signal tracing in one direction. The two-way mode has a better frequency
recovery accuracy and higher reliability than the one-way mode. If adequate bandwidth is
provided, using clock synchronization in two-way mode is recommended for frequency
synchronization when deploying 1588 ACR.
Duration Mechanism
On a 1588 ACR client, you can configure a duration for Announce, Sync and delay_resp
packets. The duration value is carried in the TLV field of a packet for negotiating signaling
and sent to a server.
Generally, the client sends a packet to renegotiate with the server before the duration times
out so that the server can continue to provide the client with synchronization services.
If the link connected to the client goes Down or fails, the client cannot renegotiate with the
server. When the duration times out, the server stops sending Sync packets to the client.
1.4.13.3 Applications
On the preceding network, CSGs support 1588 ACR and function as clients to initiate requests
for Layer 3 unicast connections to the upstream IPCLK server. The CSGs then exchange
1588v2 essages with the IPCLK server over the connections, achieving frequency recovery.
BITS1 and BITS2 are configured as clock servers for the CSGs to provide protection.
One CSG sends line clock signals carrying frequency information to NodeB1 along an E1 link.
The other CSG transmits NodeB2 frequency information either along a synchronous Ethernet
link or by sending 1588v2 messages. In this manner, both NodeBs connected to the CSGs can
achieve frequency synchronization.
Terms
Term Description
Synchronizati On a modern communications network, in most cases, the proper
on functioning of telecommunications services requires network clock
synchronization, meaning that the frequency offset or time difference
between devices must be kept in an acceptable range. Network clock
synchronization includes frequency synchronization and time
synchronization.
Time Time synchronization, also called phase synchronization, refers to the
synchronizati consistency of both frequencies and phases between signals. This means
on that the phase offset between signals is always 0.
Frequency Frequency synchronization, also called clock synchronization, refers to a
synchronizati strict relationship between signals based on a constant frequency offset or
on a constant phase offset, in which signals are sent or received at the same
average rate in a valid instance. In this manner, all devices on the
Term Description
communications network operate at the same rate. That is, the phase
difference between signals remains a fixed value.
IEEE 1588v2 1588v2, defined by the Institute of Electrical and Electronics Engineers
PTP (IEEE), is a standard for Precision Clock Synchronization Protocol for
Networked Measurement and Control Systems. The Precision Time
Protocol (PTP) is used for short.
Abbreviations
Abbreviation Full Spelling
PTP Precision Time Protocol
1588v2
BITS Building Integrated Time Supply System
BMC Best Master Clock
ACR Adaptive Clock Recovery
Definition
Circuit emulation service (CES) adaptive clock recovery (ACR) clock synchronization
implements adaptive clock frequency synchronization. CES ACR clock synchronization uses
special circuit emulation headers to encapsulate time multiplexing service (TDM) packets that
carry clock frequency information and transmits these packets over a packet switched network
(PSN).
Purpose
If a clock frequency is out of the allowed error range, problems such as bit errors and jitter
occur. As a result, network transmission performance deteriorates. CES ACR uses the
adaptive clock recovery algorithm to synchronize clock frequencies and confines the clock
frequencies of all network elements (NEs) on a digital network to the allowed error range,
enhancing network transmission stability.
When the intermediate PSN does not support clock synchronization at the physical layer and
needs to transmit clock frequency information using TDM services of the CES ACR.
1.4.14.2 Principles
1.4.14.2.1 Basic Concepts
CES
The CES technology originated from the asynchronous transfer mode (ATM) network. CES
uses emulated circuits to encapsulate circuit service data into ATM cells and transmits these
cells over the ATM network. Later, circuit emulation was used on the Metro Ethernet to
transparently transmit TDM and other circuit switched services.
CES uses special circuit emulation headers to encapsulate TDM service packets that carry
clock frequency information and transmits these packets over the PSN.
CES ACR
The CES technology generally uses the adaptive clock recovery algorithm to synchronize
clock frequencies. If an Ethernet transmits TDM services over emulated circuits, the Ethernet
uses the adaptive clock recovery algorithm to extract clock synchronization information from
data packets.
1.4.14.3 Applications
CES ACR applies to scenarios in which TDM services traverse a packet switched network
(PSN) that does not support clock synchronization and the transmit TDM service clock must
be used to restore TDM services at the receive end.
On the network shown in Figure 1-105, the clock source sends clock frequency information to
CE1. CE1 encapsulates the clock frequency information into TDM services and transmits the
services over the intermediate PSN through routers. Upon receipt, the router connected to the
slave clock uses CES ACR to recover the clock frequency. In actual applications, multiple
E1/T1 interfaces can belong to the same clock recovery domain. The system uses the PW
source selection algorithm to select a PW as the primary PW and uses the primary PW to
recover clocks. If the primary PW fails, the system automatically selects the next available
PW as the primary PW to recover clocks. If multiple PWs are configured to belong to the
same clock domain, the TDM services carried over these PWs must also have the same clock
source. Otherwise, packet loss or frequency deviation adjustment may occur.
Abbreviations
Abbreviation Full Spelling
1.4.15.1 Introduction
Definition
Synchronization
This is the process of ensuring that the frequency offset or time difference between
devices is kept within a reasonable range. In a modern communications network, most
telecommunications services require network clock synchronization in order to function
Figure 1-106 shows the differences between time synchronization and frequency
synchronization. If Watch A and Watch B always have the same time, they are in time
synchronization. If Watch A and Watch B have different time, but the time offset remains
constant, for example, 6 hours, they are in frequency synchronization.
IEEE 1588
IEEE 1588 is defined by the Institute of Electrical and Electronics Engineers (IEEE) as
Precision Clock Synchronization Protocol (PTP) for networked measurement and control
systems. It is called the Precision Time Protocol (PTP) for short.
IEEE 1588v1, released in 2002, applies to industrial automation and tests and
measurements fields. With the development of IP networks and the popularization of 3G
Purpose
Data communications networks do not require time or frequency synchronization and,
therefore, routers on such networks do not need to support time or frequency synchronization.
On IP radio access networks (RANs), time or frequency needs to be synchronized among base
transceiver stations (BTSs). Therefore, routers on IP RANs are required to support time or
frequency synchronization.
Frequency synchronization between BTSs on an IP RAN requires that frequencies between
BTSs be synchronized to a certain level of accuracy; otherwise, calls may be dropped during
mobile handoffs. Some wireless standards require both frequency and time synchronization.
Table 1-33 shows the requirements of wireless standards for time synchronization and
frequency accuracy.
Table 1-33 Requirements of wireless standards for time synchronization and frequency accuracy
Benefits
This feature brings the following benefits to operators:
Construction and maintenance costs for time synchronization on wireless networks are
reduced.
Time synchronization and frequency synchronization on wireless networks are
independent of GPS, providing a higher level of strategic security.
High-accuracy NQA-based unidirectional delay measurement is supported.
Y.1731 and IPFPM are supported.
Concepts of G.8275.1
ITU-T G.8275.1 defines the precision time protocol telecom profile for phase/time
synchronization with full timing support from the network. G.8275.1 is defined as a time
synchronization protocol.
A physical network can be logically divided into multiple clock domains. Each clock domain
has its own independent synchronous time, with which clocks in the same domain
synchronize.
A node on a time synchronization network is called a clock. G.8275.1 defines three types of
clocks:
A Telecom grandmaster (T-GM) can only be the master clock that provides time
synchronization.
A Telecom-boundary clock (T-BC) has more than one G.8275.1 interface. One interface
of the T-BC synchronizes time signals with an upstream clock, and the other interfaces
distribute the time signals to downstream clocks.
A Telecom time slave clock (T-TSC) can only be the slave clock that synchronizes the
time information of the upstream device.
1.4.15.2 Principles
1.4.15.2.1 Basic Concepts
Clock Domain
Logically, a physical network can be divided into multiple clock domains. Each clock domain
has a reference time with which all devices in the domain are synchronized. Each clock
domain has its own reference time and these times are independent of one another.
A device can transparently transmit time signals from multiple clock domains over a bearer
network to provide specific reference times for multiple mobile operator networks. The device,
however, can join only one clock domain and can synchronize only with the synchronization
time of that clock domain.
Clock Node
Each node on a time synchronization network is a clock. The 1588v2 protocol defines the
following types of clocks:
Ordinary clock
An ordinary clock (OC) has only one 1588v2 clock interface (a clock interface enabled
with 1588v2) through which the OC synchronizes with an upstream node or distributes
time signals to downstream nodes.
Boundary clock
A boundary clock (BC) has multiple 1588v2 clock interfaces, one of which is used to
synchronize with an upstream node. The other interfaces are used to distribute time
signals to downstream nodes.
The following is an example of a special case: If a device obtains the standard time from
a BITS through an external time interface (which is not enabled with 1588v2) and then
distributes time signals through two 1588v2 enabled clock interfaces to downstream
nodes, this device is a BC node, as it has more than one 1588v2 clock interface.
Transparent clock
A transparent clock (TC) does not synchronize the time with other devices (unlike BCs
and OCs) but has multiple 1588v2 clock interfaces through which it transmits 1588v2
messages and corrects message transmission delays.
TCs are classified into end-to-end (E2E) TCs and peer-to-peer (P2P) TCs.
TC+OC
A TC+OC is a special TC, which has the functions of both the TC and OC. On interfaces
having TC attributes, the TC+OC can transparently transmit 1588v2 messages and
Figure 1-107 Location of the TC, OC, and TC+OC on a time synchronization network
to the GM between clocks. After this information has been gathered, one of the clock nodes is
selected to be the GM, the interface to be used for transmitting clock signals issued by the
GM is selected, and master and slave relationships between nodes are specified. A loop-free
and full-meshed GM-rooted spanning tree is established after completion of the process.
If a master-slave relationship has been set up between two nodes, the master node periodically
sends Announce messages to the slave node. If the slave node does not receive an Announce
message from the master node within a specified period of time, it terminates the current
master-slave relationship and finds another interface with which to establish a new
master-slave relationship.
Grandmaster
A time synchronization network is like a GM-rooted spanning tree. All other nodes
synchronize with the GM.
Master/Slave
When a pair of nodes perform time synchronization, the upstream node distributing the
reference time signals is the master node and the downstream node receiving the reference
time signals is the slave node.
In practice, the delay and jitter on the network need to be taken into account, and the sending
and receiving delays are not always identical. Therefore, message-based time synchronization,
namely, 1588v2 and NTP, cannot guarantee high synchronization accuracy. For example, NTP
can only provide the synchronization accuracy of 10 to 100 ms.
1588v2 and NTP differ in implementation.
NTP runs at the application layer, for example, on the IPU of the NE20E. The delay measured
by NTP, in addition to the link delay, includes various internal processing delays, such as the
internal congestion queuing, software scheduling, and software processing delays. These
make the message transmission delay unstable, causing message transmission delays in two
directions to be asymmetric. As a result, the accuracy of NTP-based time synchronization is
low.
1588v2 presumes that the link delay is constant or changes so slowly that the change between
two synchronization processes can be ignored, and the message transmission delays in two
directions on a link are identical. Messages are time-stamped for delay measurement at the
physical layer of the interface board. This ensures that time synchronization based on the
obtained link delay is extremely accurate.
1588v2 defines two modes for the delay measurement and time synchronization mechanisms,
namely, Delay and Peer Delay (PDelay).
Delay Mode
The Delay mode is applied to end-to-end (E2E) delay measurement. Figure 1-108 shows the
delay measurement in Delay mode.
As shown in Figure 1-108, t-sm and t-ms represent the sending and receiving delays respectively and are
presumed to be identical. If they are different, they should be made identical through asymmetric delay
correction. For details about asymmetric delay correction, see the following part of this section.
Follow_Up messages are used in two-step mode. Only the one-step mode is described in this part and
Follow_UP messages are not mentioned. For details about the two-step mode, see the following part of
this section.
A master node periodically sends a Sync message carrying the sending timestamp t1 to the
slave node. When the slave node receives the Sync message, it time-stamps t2 to the message.
The slave node periodically sends the Delay_Req message carrying the sending timestamp t3
to the master node. When the master node receives the Delay_Req message, it time-stamps t4
to the message and returns a Delay_Resp message to the slave node.
The slave node receives a set of timestamps, including t1, t2, t3, and t4. Other elements
affecting the link delay are ignored.
The message transmission delays of the link between the master and slave nodes in two
directions equal (t4 - t1) - (t3 - t2). If the message transmission delays between both nodes are
identical, the message transmission delay in one direction is equal to [(t4 - t1) - (t3 - t2)]/2.
The time offset between the master and slave nodes equals [(t2-t1)+(t4-t3)]/2.
Based on the time offset, the slave node synchronizes with the master node.
As shown in Figure 1-109, time synchronization is repeatedly performed to ensure constant
synchronization between the master and slave nodes.
The BC and OC can be directly connected as shown in Figure 1-109. Alternatively, they can
be connected through other devices, but these devices must be TCs to ensure the accuracy of
time synchronization. The TC only transparently transmits 1588v2 messages and corrects the
message transmission delay (which requires that the TC identify these 1588v2 messages).
To ensure the high accuracy of 1588v2 time synchronization, it is required that the message
transmission delays in two directions between master and slave nodes be stable. Usually, the
link delay is stable but the transmission delay on devices is unstable. Therefore, if two nodes
performing time synchronization are connected through forwarding devices, the time
synchronization accuracy cannot be guaranteed. The solution to the problem is to perform the
transmission delay correction on these forwarding devices, which requires that the forwarding
devices be TCs.
Figure 1-110 shows how the transmission delay correction is performed on a TC.
The TC performs the transmission delay correction by adding the time it takes to transmit the
message to the Correction field of a 1588v2 message. This means that the TC deducts the
receiving timestamp of the 1588v2 message on its inbound interface and adds the sending
timestamp to the 1588v2 message on its outbound interface.
In this manner, the 1588v2 message exchanged between the master and slave nodes, when
passing through multiple TCs, carry message transmission delays of all TCs in the Correction
field. When the value of the Correction field is deducted, the value obtained is the link delay,
ensuring high accuracy time synchronization.
A TC that records the transmission delay from end to end as described above is the E2E TC.
Time synchronization in Delay mode can be applied only to E2E TCs. Figure 1-111 shows
how the BC, OC, and E2E TC are connected and how 1588v2 operates.
Figure 1-111 Networking diagram of the BC, OC, and E2E TC and the 1588v2 operation
PDelay Mode
When performing time synchronization in PDelay mode, the slave node deducts both the
message transmission delay and upstream link delay. This requires that adjacent devices
perform the delay measurement in PDelay mode to enable each device on the link to know its
upstream link delay. Figure 1-112 shows the delay measurement in PDelay mode.
As shown in Figure 1-108, t-sm and t-ms represent the sending and receiving delays respectively and are
presumed to be identical. If they are different, they should be made identical through asymmetric delay
correction. For details of asymmetric delay correction, see the following part of this section.
Follow_Up messages are used in two-step mode. In this part, the one-step mode is described and
Follow_UP messages are not mentioned. For details of the two-step mode, see the following part of this
section.
Node 1 periodically sends a PDelay_Req message carrying the sending timestamp t1 to node
2. When the PDelay_Req message is received, node 2 time-stamps t2 to the PDelay_Req
message. Then, node 2 sends a PDelay_Resp message carrying the sending timestamp t3 to
node 1. When the PDelay_Resp message is received, node 1 time-stamps t4 to the
PDelay_Resp message.
Node 1 obtains a set of timestamps, including t1, t2, t3, and t4. Other elements affecting the
link delay are ignored.
The message transmission delays in two directions on the link between node 1 and node 2
equal (t4 - t1) - (t3 - t2).
If the message transmission delays in two directions on the link between node 1 to node 2 are
identical, the message transmission delay in one direction equals [(t4 - t1) - (t3 - t2)]/2.
The delay measurement in PDelay mode does not differentiate between the master and slave
nodes. All nodes send PDelay messages to their adjacent nodes to calculate adjacent link delay.
This calculation process repeats and the message transmission delay in one direction is
updated accordingly.
The delay measurement in PDelay mode does not trigger time synchronization. To implement
time synchronization, the master node needs to periodically send Sync messages to the slave
node and the slave node receives the t1 and t2 timestamps. The slave node then deducts the
message transmission delay on the link from the master node to the slave node. The obtained
t2-t1-CorrectionField is the time offset between the slave and master nodes. The slave node
uses the time offset to synchronize with the master node. Figure 1-113 shows how time
synchronization is implemented in PDelay mode in the scenario where the BC and OC are
directly connected.
Figure 1-115 shows how the BC, OC, and E2E TC are connected and how 1588v2 operates.
Figure 1-115 Schematic diagram of transmission delay correction in PDelay mode on a P2PTC
One-Step/Two-Step
In one-step mode, both the Sync messages for time synchronization in Delay mode and
PDelay_Resp messages for time synchronization in PDelay mode are stamped with a sending
time.
In two-step mode, Sync messages for time synchronization in Delay mode and PDelay_Resp
messages for time synchronization in PDelay mode are not stamped with a sending time. The
sending time is carried in Follow_Up and PDelay_Resp_Follow_Up messages.
Asymmetric Correction
Theoretically, 1588v2 requires the message transmission delays in two directions on a link to
be symmetrical. Otherwise, the algorithms of 1588v2 time synchronization cannot be
implemented. In practice, however, the message transmission delays in two directions on a
link may be asymmetric due to the attributes of a link or a device. For example, if the delays
between receiving the message and time-stamping the message in two directions are different,
1588v2 provides a mechanism of asymmetric delay correction, as shown in Figure 1-116.
Usually, t-ms is identical with t-sm. If they are different, the user can set a delay offset
between them if the delay offset is constant and obtainable by measurement device. 1588v2
performs the time synchronization calculation according to the asymmetric correction value.
In this manner, a high level of time synchronization accuracy can be achieved on an
asymmetric-delay link.
Packet Encapsulation
1588v2 defines the following multiple packet encapsulation modes:
Layer 2 multicast encapsulation through a multicast MAC address
The EtherType field is 0x88F7, and the multicast MAC address is 01-80-C2-00-00-0E
(in PDelay messages) or 01-1B-19-00-00-00 (in non-PDelay messages).
1588v2 recommends that the Layer 2 multicast encapsulation mode be used. The NE20E
supports Layer 2 multicast encapsulation with tags. Figure 1-117 shows the Layer 2
multicast encapsulation without tags.
BITS Interface
1588v2 enables clock nodes to synchronize with each other, but cannot enable them to
synchronize with Greenwich Mean Time (GMT). If the clock nodes need to synchronize with
GMT, an external time source is required. That is, the GM needs to be connected to an
external time source to obtain the reference time in non-1588v2 mode.
Currently, the external time sources are from satellites, such as the GPS from the U.S.A,
Galileo from Europe, GLONASS from Russia, and Beidou from China. Figure 1-121 shows
how the GM and an external time source are connected.
Clock Synchronization
In addition to time synchronization, 1588v2 can be used for clock synchronization, that is,
frequency recovery can be achieved through 1588v2 messages.
1588v2 time synchronization in Delay or PDelay mode requires the device to periodically
send Sync messages to its peer.
The sent Sync message carries a sending timestamp. After receiving the Sync message, the
peer adds a receiving timestamp to it. When the link delay is stable, the two timestamps
change at the same pace. If the receiving timestamp changes are faster or slower, it indicates
that the clock of the receiving device runs faster or slower than the clock of the sending
device. In this case, the clock of the receiving device needs to be adjusted. When this occurs,
the frequencies of the two devices are synchronized.
The frequency restored through 1588v2 messages has a lower accuracy than the frequency
restored through synchronous Ethernet. Therefore, it is recommended to perform frequency
synchronization through synchronous Ethernet and time synchronization through 1588v2.
1588v2 restores the frequency in the following modes:
Hop-by-hop
In hop-by-hop mode, all devices on a link are required to support 1588v2. The frequency
recovery in this mode is highly accurate. In the case of a small number of hops, the
frequency recovery accuracy can meet the requirement of ITU-T G.813 (stratum 3
standard).
End-to-end (Delay and jitter may occur on the transit network.)
In end-to-end mode, the forwarding devices do not need to support 1588v2, and the
delay of the forwarding path is only required to meet a specified level, for example, less
than 20 ms. The frequency recovery accuracy in this mode is low, and can meet only the
requirements of the G.8261 and base stations (50 ppb) rather than that of the stratum 3
clock standard.
To achieve high frequency recovery accuracy, 1588v2 requires Sync messages to be sent at a
high rate of at least 100 packets/s.
The NE20E meets the following clock standards:
G.813 and G.823 for external clock synchronization
G.813 and G.823/G.824 for E1 clocks
G.8261 and G.8262 for synchronous Ethernet clocks
G.8261 and G.823/G.824 for frequency recovery through 1588v2 messages
At present, the NE20E supports frequency recovery through 1588v2 messages in
hop-by-hop mode, rather than in end-to-end or inter-packet delay variation (PDV)
network mode. The NE20E is not committed to be G.813 and G.8262 compliant.
Because a master clock has multiple slave clocks, it is recommended to use the BITS or IP
clock server as the master clock. It is not recommended to use any device as the master clock
because the CPU of the device may be overloaded.
As shown in Figure 1-122, clock servers and NodeBs exchange TOP-encapsulated 1588
messages over a QoS-enabled bearer network with the jitter being less than 20 ms.
Scenario description:
NodeBs only need frequency synchronization.
The bearer network does not support 1588v2 or frequency recovery in synchronous
Ethernet mode.
Solution description:
The bearer network is connected to a wireless IP clock server and adopts 1588v2 clock
synchronization and frequency recovery in E2E mode.
The clock server sends 1588v2 timing messages, which are transparently transmitted
over the bearer network to NodeBs. Upon receiving the timing messages, NodeBs
perform frequency recovery.
1588v2 timing messages need to be transparently transmitted by priority over the bearer
network; the E2E jitter on the bearer network must be less than 20 ms.
Advantage of the solution: Devices on the bearer network are not required to support
1588v2, and are therefore easily deployed.
Disadvantage of the solution: Only frequency synchronization rather than time
synchronization is performed. In practice, an E2E jitter of less than 20 ms is not ensured.
As shown in Figure 1-123, the clock source can send clock signals to NodeBs through the
1588v2 clock, WAN clock, synchronous Ethernet clock, or any combination of clocks.
Scenario description:
NodeBs only need frequency synchronization.
GE links on the bearer network support the 1588v2 clock rather than the synchronous
Ethernet clock.
Solution description:
The Synchronous Digital Hierarchy (SDH) or synchronous Ethernet clock sends stratum
3 clock signals through physical links. On the GE links that do not support the
synchronous Ethernet clock, stratum 3 clock signals are transmitted through 1588v2.
Advantage of the solution: The solution is simple and flexible.
Disadvantage of the solution: Only frequency synchronization rather than time
synchronization is performed.
Figure 1-124 Networking diagram of the bearer and wireless networks in the same clock domain
Scenario description:
NodeBs need to synchronize time with each other.
The bearer and wireless networks are in the same clock domain.
Solution description:
The core node supports GPS or BITS clock interfaces.
All nodes on the bearer network function as BC nodes, which support the link delay
measurement mechanism to handle fast link switching.
Links or devices that do not support 1588v2 can be connected to devices with GPS or
BITS clock interfaces to perform time synchronization.
Advantage of the solution: The time of all nodes is synchronous on the entire network.
Disadvantage of the solution: All nodes on the entire network must support 1588v2.
Figure 1-125 Networking diagram of the bearer and wireless networks in different clock domains
Scenario description:
Scenario description:
NodeBs need to synchronize time with one another.
The bearer and wireless networks are in the same clock domain.
Solution description:
Core nodes support GPS/BITS interfaces.
Network-wide time synchronization is achieved from the core node in T-BC mode. All
T-BC nodes support path delay measurement to adapt to fast link switching.
Network-wide synchronization can be traced to two grand masters.
The advantage of the solution is that the network-wide time is synchronized to ensure the
optimal tracing path.
The disadvantage of the solution is that all nodes on the network need to support 1588v2
and G.8275.1.
Terms
Terms Description
Synchron On a modern communications network, in most cases, the proper functioning
ization of telecommunications services requires network clock synchronization,
meaning that the frequency offset or time difference between devices must be
kept in an acceptable range. Network clock synchronization includes time
synchronization and frequency synchronization.
Time synchronization
Time synchronization, also called phase synchronization, refers to the
consistency of both frequencies and phases between signals. This means
that the phase offset between signals is always 0.
Frequency synchronization
Frequency synchronization, also called clock synchronization, refers to a
strict relationship between signals based on a constant frequency offset or a
constant phase offset, in which signals are sent or received at the same
average rate in a valid instance. In this manner, all devices on the
communications network operate at the same rate. That is, the phase
difference between signals remains a fixed value.
IEEE 1588v2, defined by the Institute of Electrical and Electronics Engineers
1588v2 (IEEE), is a standard for Precision Clock Synchronization Protocol for
PTP Networked Measurement and Control Systems. The Precision Time Protocol
(PTP) is used for short.
Clock Logically, a physical network can be divided into multiple clock domains.
domain Each clock domain has a reference time, with which all devices in the domain
are synchronized. Different clock domains have their own reference time,
which is independent of each other.
Clock Each node on a time synchronization network is a clock. The 1588v2 protocol
node defines three types of clocks: OC, BC, and TC.
Clock Clock source selection is a method to select reference clocks based on the clock
reference selection algorithm.
source
One-step In one-step mode, Sync messages in Delay mode and PDelay_Resp messages
mode in PDelay mode are stamped with the time when messages are sent.
Two-step In two-step mode, Sync messages in Delay mode and PDelay_Resp messages
mode in PDelay mode only record the time when messages are sent and carry no
timestamps. The timestamps are carried in the messages, such as Follow_Up
and PDelay_Resp_Follow_Up messages.
Abbreviations
Abbreviation Full Spelling
1588v2 Precision Time Protocol
Definition
1588 Adaptive Time Recovery (ATR) is a PTP-based technology that allows routers to
establish clock links and implement time synchronization over a third-party network using
PTP packets in Layer 3 unicast mode.
1588 ATR is an advancement compared to 1588v2, the latter of which requires 1588v2
support on all network devices.
1588 ATR is a client/server protocol through which servers communicate with clients to
achieve time synchronization.
Purpose
1588v2 is a software-based technology used to achieve frequency and time synchronization
and can support hardware timestamping to provide greater accuracy. However, 1588v2
requires support from all devices on the live network.
To address this disadvantage, 1588 ATR is introduced to allow time synchronization over a
third-party network that includes 1588v2-incapable devices. On the live network, 1588v2 is
preferred for 1588v2-capable devices, and 1588 ATR is used when 1588v2-incapable devices
exist.
Benefits
This feature offers the following benefits to carriers:
Does not require 1588v2 to be supported by all network devices, reducing network
construction costs.
Fits for more network applications that meet time synchronization requirements.
Features Supported
The 1588 ATR features supported by NE20Es are as follows:
An NE20E that functions as a 1588 ATR server can synchronize time information with
upstream devices using the BITS source and transmit time information to downstream
devices.
An NE20E that functions as a 1588 ATR server can synchronize time information with
upstream devices using 1588v2/G.8275.1 and transmit time information to downstream
devices.
An NE20E can function only as the 1588 ATR server. The following restrictions apply to network
deployment:
When 1588 ATR is used to implement time synchronization over a third-party network, reduce the
packet delay variation (PDV) and the number of devices on the third-party network as much as
possible in order to ensure time synchronization performance on clients. For details, see
performance specifications for clients.
The server and client communicate with each other through PTP packets which can be either Layer 3
IP packets or single-VLAN-tagged packets. The PTP packets cannot carry two VLAN tags or the
MPLS label.
The interface used to send PTP packets on the server needs to be support 1588v2.
1.4.16.2 Principles
1.4.16.2.1 Principles of 1588 ATR
1588 ATR is used to deliver time synchronization between clients and servers (routers).
After clock links are established through negotiation between clients and servers, 1588 ATR
uses PTP packets in Layer 3 unicast mode to obtain the clock difference between clients and
servers and then implement time synchronization based on the difference.
Synchronization Process
After negotiation is complete, 1588 ATR servers exchange PTP packets with clients to
implement time synchronization.
1588 ATR works in one-way or two-way mode.
One-way mode
a. The server sends PTP packets that carry timestamps t1 and t1' to the client.
b. The client receives PTP packets at timepoints t2 and t2'. Timestamps t1 and t1'
indicate the server-side clock information, and timestamps t2 and t2' indicate the
client-side clock information. The server-side and client-side timestamps are
compared to obtain the frequency offset between the server and client, which is
used for frequency synchronization. For example, if (t2-t1)/(t2'-t1') is 1, the
frequency on the server is the same as that on the client. Otherwise, the client
frequency needs to be synchronized with the server.
Two-way mode
a. The server sends a Sync packet carrying timestamp t1 to the client.
b. The client receives the Sync packets at timepoint t2.
c. The client sends a 1588 delay_req packet carrying timestamp t3 to the server.
d. The server receives the 1588 delay_req packet at timepoint t4 and then generates a
Delay_Rep packet and sends it to the slave clock.
A 1588 ATR server supports both the one-way and two-way modes by default. A 1588 ATR
client supports either the one-way or two-way mode.
With bandwidth permitted, the two-way mode is recommended for 1588 ATR deployment.
Duration Mechanism
A 1588 ATR client supports the duration specified in Announce, Sync, and Delay Resp
packets. The duration can be placed to the TLV field in Signaling packets before they are sent
to the server.
In normal situations, a client initiates a re-negotiation to a server before the duration expires
so that the server can continue providing synchronization with the client.
If a client becomes Down, it cannot initiate a re-negotiation. After the duration collapses, the
server does not send synchronization packets to the server any more.
Per-hop BC + Server
1588 ATR servers can synchronize time synchronization with upstream devices and send the
time source information to clients.
1.4.16.3 Applications
On the IP RAN shown in the following figure, time synchronization needs to be performed
between NodeBs, but the third-party network (such as a microwave or switch network) does
not support 1588v2. In this case, 1588 ATR can be configured to allow time synchronization
over the third-party network. routers enabled with 1588 ACR can function as a BC to
synchronize time information with upstream devices and as a 1588 ATR server to synchronize
time information with NodeBs.
Terms
Term Definition
Synchron Most telecommunication services running on a modern communications
ization network require network-wide synchronization. Synchronization means that the
frequency offset or time difference between devices must remain in a specified
range. Clock synchronization is categorized as frequency synchronization or
time synchronization.
Time Time synchronization, also known as phase synchronization, refers to the
synchron consistency of both frequencies and phases between signals. That is, the phase
ization offset between signals is always 0.
Frequenc Frequency synchronization, also known as clock synchronization, refers to the
y strict relationship between signals based on a constant frequency offset or
synchron phase offset, in which signals are sent or received at an average rate in a
ization moment. In this manner, all devices in the communications network operate at
the same rate. That is, the difference of phases between signals is a constant
value.
IEEE A standard entitled Precision Clock Synchronization Protocol for Networked
1588v2 Measurement and Control Systems, defined by the Institute of Electrical and
PTP Electronics Engineers (IEEE). It is also called the Precision Time Protocol
(PTP).
Background
As the commercialization of LTE-TDD and LTE-A accelerates, there is a growing need for
time synchronization on base stations. Traditionally, the GPS and PTP solutions were used on
base stations to implement time synchronization.
The GPS solution requires GPS antenna to be deployed on each base station, leading to high
TCO. The PTP solution requires 1588v2 support on network-wide devices, resulting in huge
costs on network reconstruction for network carriers.
Furthermore, GPS antenna can properly receive data from GPS satellites only when they are
placed outdoor and meet installation angle requirements. When it comes to indoor deployment,
long feeders are in place to penetrate walls, and site selection requires heavy consideration
due to high-demanding lightning protection. These disadvantages lead to high TCO and make
GPS antenna deployment challenging on indoor devices. Another weakness is that most
indoor equipment rooms are leased, which places strict requirements for coaxial cables
penetrating walls and complex application procedure. For example, taking security factors
into consideration, the laws and regulations in Japan specify that radio frequency (RF) cables
are not allowed to be deployed in rooms by penetrating walls.
To address the preceding challenges, the Atom GPS timing system is introduced to NE20Es.
Specifically, an Atom GPS module which is comparable to a lightweight BITS device is
inserted to an NE20E to provide GPS access to the bearer network. Upon receipt of GPS
clock signals, the Atom GPS module converts them into SyncE signals and then sends the
SyncE signals to NE20Es. Upon receipt of GPS time signals, the Atom GPS module converts
them into 1588v2 signals and then sends the 1588v2 signals to base stations. This mechanism
greatly reduces the TCO for carriers.
Benefits
This feature offers the following benefits to carriers:
For newly created time synchronization networks, the Atom GPS timing system reduces
the deployment costs by 80% compared to traditional time synchronization solutions.
For the expanded time synchronization networks, the Atom GPS timing system can reuse
the legacy network to protect investment.
1.4.17.2 Principles
1.4.17.2.1 Modules
The Atom GPS timing system includes two types of modules: Atom GPS modules and
clock/time processing modules on routers.
1.4.17.3 Applications
On the network shown in the following figure, the Atom GPS timing feature is mainly used in
three synchronization solutions:
SyncE frequency synchronization + Atom GPS time synchronization
On networks that do not support time synchronization, this solution allows time
synchronization with an Atom GPS module inserted into an router.
Atom GPS frequency synchronization + 1588v2 time synchronization
On networks that do not support frequency synchronization, this solution allows
frequency synchronization with an Atom GPS module inserted into an router.
Atom GPS frequency synchronization + Atom GPS time synchronization
On networks that cannot be reconstructed, this solution allows time and frequency
synchronization with an Atom GPS module inserted into an router.
Terms
Term Definition
Synchron Most telecommunication services running on a modern communications
ization network require network-wide synchronization. Synchronization means that the
frequency offset or time difference between devices must remain in a specified
range. Clock synchronization is categorized as frequency synchronization or
time synchronization.
Time Time synchronization, also known as phase synchronization, refers to the
synchron consistency of both frequencies and phases between signals. That is, the phase
ization offset between signals is always 0.
Frequenc Frequency synchronization, also known as clock synchronization, refers to the
y strict relationship between signals based on a constant frequency offset or
synchron phase offset, in which signals are sent or received at an average rate in a
ization moment. In this manner, all devices in the communications network operate at
the same rate. That is, the difference of phases between signals is a constant
value.
IEEE A standard entitled Precision Clock Synchronization Protocol for Networked
1588v2 Measurement and Control Systems, defined by the Institute of Electrical and
PTP Electronics Engineers (IEEE). It is also called the Precision Time Protocol
(PTP).
1.4.18 NTP
The Network Time Protocol (NTP) is supported only by a physical system (PS).
1.4.18.1 Introduction
Definition
The Network Time Protocol (NTP) is an application layer protocol in the TCP/IP protocol
suite. NTP synchronizes the time among a set of distributed time servers and clients. NTP is
built on the Internet Protocol (IP) and User Datagram Protocol (UDP). NTP messages are
transmitted over UDP, using port 123.
NTP is evolved from the Time Protocol and the ICMP Timestamp message, but is specifically
designed to maintain time accuracy and robustness.
Purpose
In the NTP model, a number of primary reference sources, synchronized to national standards
by wire or radio, are connected to widely accessible resources, such as backbone gateways.
These gateways act as primary time servers. The purpose of NTP is to convey timekeeping
information from these primary time servers to other time servers (secondary time servers).
Secondary time servers are synchronized to the primary time servers. The servers are
connected in a logical hierarchy called a synchronization subnet. Each level of the
synchronization subnet is called a stratum. For example, the primary time servers are stratum
1, and the secondary time servers are stratum 2. Servers with larger stratum numbers are more
likely to have less accurate clocks than those with smaller stratum numbers.
When multiple time servers exist on a network, use a clock selection algorithm to synchronize the
stratums and time offsets of time servers. This helps improve local clock precision.
Implementation
Figure 1-132 illustrates the process of implementing NTP. Device A and Device B are
connected through a wide area network (WAN). They both have independent system clocks
that are synchronized through NTP.
In the following example:
Before Device A synchronizes its system clock to Device B, the clock of Device A is
10:00:00 am and the clock of Device B is 11:00:00 am.
Device B functions as an NTP server, and Device A must synchronize its clock signals
with Device B.
It takes 1 second to transmit an NTP packet between Device A and Device B.
It takes 1 second for Device A and Device B to process an NTP packet.
1.4.18.2 Principles
1.4.18.2.1 NTP Implementation Model
Using the NTP implementation model, a client creates the following processes with each peer:
Transmit process
Receive process
Update process
These processes share a database and are interconnected through a message-transfer system.
When the client has multiple peers, its database is divided into several parts, with each part
dedicated to a peer.
Figure 1-133 shows the NTP implementation model.
Transmit Process
The transmit process, controlled by each timer for peers, collects information in the database
and sends NTP messages to the peers.
Each NTP message contains a local timestamp marking when the message is sent or received
and other information necessary to determine a clock stratum and manage the association. The
rate at which messages are sent is determined by the precision required by the local clock and
its peers.
Receive Process
The receive process receives messages, including NTP messages and other protocol messages,
as well as information sent by directly connected radio clocks.
When receiving an NTP message, the receive process calculates the offset between the peer
and local clocks and incorporates it into the database along with other information that is
useful for locating errors and selecting peers.
Update Process
The update process handles the offset of each peer after receiving NTP response messages and
selects the most precise peer using a specific selection algorithm.
This process may involve either many observations of few peers or a few observations of
many peers, depending on the accuracy.
The functions of the primary and secondary time servers are as follows:
A primary time server is directly synchronized to a primary reference source, usually a
radio clock or global positioning system (GPS).
A secondary time server is synchronized to another secondary time server or a primary
time server. Secondary time servers use NTP to send time information to other hosts in a
Local Area Network (LAN).
When there is no fault, primary and secondary servers in the synchronization subnet assume a
hierarchical master-slave structure, with the primary servers at the root and secondary servers
at successive stratums toward the leaf node. The lower the stratum, the less precise the clock
(where one is the highest stratum).
As the stratum increases from one, the clock sample accuracy gradually decreases, depending
on the network paths and local-clock stabilities. To prevent tedious calculations necessary to
estimate errors in each specific configuration, it is useful to calculate proportionate errors.
Proportionate errors are approximate and based on the delay and dispersion relative to the root
of the synchronization subnet.
This design helps the synchronization subnet in automatically reconfiguring the hierarchical
master-slave structure to produce the most accurate and reliable time, even when one or more
primary or secondary servers or the network paths in the subnet fail. If all primary servers fail,
one or more backup primary servers continue operations. If all primary servers over the
subnet fail, the remaining secondary servers then synchronize among themselves. In this case,
distances reach upwards to a pre-selected maximum "infinity".
Upon reaching the maximum distance to all paths, a server drops off the subnet and runs
freely based on its previously calculated time and frequency. The timekeeping errors of a
Device having a stabilized oscillator are not more than a few milliseconds per day as these
computations are expected to be very precise, especially in terms of frequency.
In the case of multiple primary servers, a specific selection algorithm is used to select the
server at a minimum synchronization distance. When these servers are at approximately the
same synchronization distance, they may be selected randomly.
The accuracy cannot be decreased because of random selection when the offset between
the primary servers is less than the synchronization distance.
When the offset between the primary servers is greater than the synchronization distance,
use filtering and selection algorithms to select the best servers available and discard
others.
Peer Mode
In peer mode, the active and passive ends can be synchronized. The end with a lower stratum
(larger stratum number) is synchronized to the end with a higher stratum (smaller stratum
number).
Symmetric active: A host operating in this mode periodically sends messages regardless
of the reachability or stratum of its peer. The host announces its willingness to
synchronize and be synchronized by its peer.
The symmetric active end is a time server close to the leaf node in the synchronization
subnet. It has a low stratum (large stratum number). In this mode, time synchronization
is reliable. A peer is configured on the same stratum and two peers are configured on the
stratum one level higher (one stratum number smaller). In this case, synchronization poll
frequency is not important. Even when error packets are returned because of connection
failures, the local clocks are not significantly affected.
Symmetric passive: A host operating in this mode receives packets and responds to its
peer. The host announces its willingness to synchronize and be synchronized by its peer.
The prerequisites of being a symmetric passive host are as follows:
− The host receives messages from a peer operating in the symmetric active mode.
− The peer is reachable.
− The peer operates at a stratum lower than or equal to the host.
The host operating in the symmetric passive mode is at a low stratum in the synchronization
subnet. It does not need to know the feature of the peer. A connection between peers is set up
and status variables must be updated only when the symmetric passive end receives NTP
messages from the peer.
In NTP peer mode, the active end functions as a client and the passive end functions as a server.
Client/Server Mode
Client: A host operating in this mode periodically sends messages regardless of the
reachability or stratum of the server. The host synchronizes its clock with that on the
server but does not alter the clock on the server.
Server: A host operating in this mode receives packets and responds to the client. The
host provides synchronization information for all its clients but does not alter its own
clock.
A host operating in the client mode periodically sends NTP messages to a server during and
after its restart. The server does not need to retain state information when the client sends the
request. The client freely manages the interval for sending packets according to actual
conditions.
Kiss-o'-Death (KOD) packets provide useful information to a client and are used for status
reporting and access control. When KOD is enabled on the server, the server can send packets
with kiss codes DENY and RATE to the client.
After the client receives a packet with kiss code DENY, the client demobilizes any
associations with that server and stops sending packets to that server.
After the client receives a packet with kiss code RATE, the client immediately reduces
its polling interval to that of the server and continues to reduce it each time it receives a
RATE kiss code.
Broadcast Mode
A host operating in broadcast mode periodically sends clock-synchronization packets to
the broadcast IPv4 address regardless of the reachability or stratum of the clients. The
host provides synchronization information for all its clients but does not alter its own
clock.
A client listens to the broadcast packets sent by the server. When receiving the first
broadcast packet, the client temporarily starts in the client/server mode to exchange
packets with the server. This allows the client to estimate the network delay. The client
then reverts to the broadcast mode, continues to listen to the broadcast packets, and
re-synchronizes the local clock based on the received broadcast packets.
The broadcast mode is run on multiple workstations. Therefore, high-speed LANs of the
highest accuracy are not required. In a typical scenario, one or more time servers in a LAN
periodically send broadcast packets to the workstations. The LAN packet transmission delay
is only milliseconds.
If multiple time servers are available to enhance reliability, a clock selection algorithm is
useful.
Multicast Mode
A host operating in the multicast mode periodically sends clock-synchronization packets
to a multicast IPv4/IPv6 address. The host is usually a time server using high-speed
multicast media in a LAN. The host provides synchronization information for all its
peers but does not alter its own clock.
A client listens to multicast packets sent by the server. After receiving the first multicast
packet, the client temporarily starts in the client/server mode to exchange packets with
the server. This allows the client to estimate the network delay. The client then reverts to
the multicast mode, continues to listen to the multicast packets, and re-synchronizes the
local clock based on the received multicast packets.
Manycast Mode
A client operating in manycast mode sends periodic request packets to a designated IPv4
or IPv6 multicast address in order to search for a minimum number of associations. It
starts with a time to live (TTL) value equal to one and continuously adding one to it until
the minimum number of associations is made, or when the TTL reaches a maximum
value. If the TTL reaches its maximum value, and still not enough associations are
mobilized, the client stops transmission for a timeout period to clear all associations, and
then repeats the search process. If a minimum number of associations have been
mobilized, then the client starts transmitting one packet per timeout period to maintain
the associations.
A designated manycast server within range of the TTL field in the packet header listens
for packets with that address. If a server is suitable for synchronization, it returns an
ordinary server (mode 4) packet using the client's unicast address.
Manycast mode is applied to a small set of servers scattered over a network. Clients can
discover and synchronize to the closest manycast server. Manycast can especially be used
where the identity of the server is not fixed and a change of server does not require
reconfiguration of all the clients on the network.
NTP Operation
A host operating in an active mode (symmetric active, client or broadcast mode) must be
configured.
Its peer operating in a passive mode (symmetric passive or server mode) requires no
pre-configuration.
An error occurs when the host and its peer operate in the same mode. In such a case, one
ignores messages sent by the other, and their associations are then dissolved.
Transmit Process
In all modes (except the client mode with a broadcast server and the server mode), the
transmit process starts when the peer timer expires. In the client mode with a broadcast server,
messages are never sent. In the server mode, messages are sent only in response to received
messages. This process is also invoked by the receive process when the received NTP
message does not result in a local persistent association. To ensure a valid response, the
transmit timestamp must be added to packets to be sent. Therefore, the values of variables
carried in the response packet must be accurately saved.
Broadcast and multicast servers that are not synchronized will start the transmit process when
the peer timer expires.
Receive Process
The receive process starts when an NTP message arrives. First, it checks the mode field in the
packet. Value 0 indicates that the peer runs an earlier NTP version. If the version number in
the packet matches the current version, the receive process continues with the following steps.
If the version numbers do not match, the packet is discarded, and the association (if not
pre-configured) is dissolved. The receive process various according to the following result of
calculating the combination of the local and remote clock modes:
If both the local and remote hosts are operating in client mode, an error occurs, and the
packet is discarded.
If the result is recv, the packet is processed, and the association is marked reachable if
the received packet contains a valid header. In addition, if the received packet contains
valid data, the clock-update process is called to update the local clock. If the association
was not previously configured, it is dissolved.
If the result is xmit, the packet is processed, and an immediate response packet is sent.
The association is then dissolved if it is not pre-configured.
If the result is pkt, the packet is processed, and the association is marked reachable if the
received packet contains a valid header. In addition, if the received packet contains valid
data, the clock-update process is called to update the local clock. If the association was
not pre-configured, an immediate reply is sent, and the association is dissolved.
Packet Process
The packet process checks message validity, calculates delay/offset samples, and invokes
other processes to filter data and select a reference source. First, the transmit timestamp must
be different from the transmit timestamp in the last message. If the transmit timestamp are the
same, the message may be an outdated duplicate message.
Second, the originate timestamp must match the last message sent to the same peer. If a
mismatch occurs, the message may be out of order, forged, or defective.
Lastly, the packet process uses a clock selection algorithm to select the best clock sample
from the specified clocks or clock groups at different stratums. The delay (peer delay), offset
(peer offset), and dispersion (peer dispersion) for the peer are all determined.
Clock-Update Process
After the offset, delay, and dispersion of the valid clock are determined by the clock-filter
process, the clock-selection process invokes the clock-update process. The result of the
clock-selection and clock-combining processes is the final clock correction value. The
local-clock updates the local clock with this value. If no reference source is found after these
processes, the clock-update process does not perform any other operation.
The clock-selection is then invoked. It contains two algorithms: intersection and clustering.
The intersection algorithm generates a list of candidate peers suitable to be the reference
source and calculates a confidence interval for each peer. It discards falsetickers using a
technique adopted from Marzullo and Owicki [MAR85].
The clustering algorithm orders the list of remaining candidates based on their stratums
and synchronization distances. It repeatedly discards outlier peers based on the
dispersion until only the most accurate, precise, and stable candidates remain.
If the offset, delay, and dispersion of the candidate peers are almost identical, first analyze the
clock situation by combining candidates. Then provide the parameters determined through
comprehensive analysis to the local end for updating the local clock.
Static Associations
Static associations are set up using commands.
Dynamic Associations
Dynamic associations are set up when an NTP packet is received by the client or peer.
Access Control
The NTP is designed to handle accidental or malicious data modification or destruction. These
problems typically do not result in timekeeping errors on other time servers in the
synchronization subnet. The success of this design is, however, based on the redundant time
servers and various network paths. It is also assumed that data modification or destruction
does not occur simultaneously on many time servers over the synchronization subnet. To
prevent subnet vulnerability, select trusted time servers and allow them to be the clock
sources.
1.4.18.3 Applications
Applicable Environment
The synchronization of clocks over the network is increasingly important as the network
topology becomes increasingly complex. NTP was developed to implement the
synchronization of system clocks over the network.
Application Instances
As shown in Figure 1-137, the time server B in the LAN is synchronized to the time server A
on the Internet, and the hosts in the LAN are synchronized to the time server B in the LAN. In
this way, the hosts are synchronized to the time server on the Internet.
1.4.19 OPS
1.4.19.1 Overview
Definition
The Open Programmability System (OPS) is an open platform that provides Representational
State Transfer (RESTful) Application Programming Interfaces (APIs) to achieve
programmability, allowing third-party applications to run on the platform.
The OPS also supports embedded third-party applications and an event subscription
mechanism. Using the OPS, users can deploy supplementary functions that facilitate service
extension and intelligent management of devices, reducing their operation and maintenance
costs.
Purpose
Conventional network devices provide only limited functions and predefined services. As the
network develops, the static and inflexible service provisioning mode cannot meet the
requirements for diversified and differentiated services. Some customers require devices with
specific openness, on which they can develop their own functions and deploy proprietary
management policies to implement automatic management and maintenance, lowering
management costs.
To meet the preceding customer requirements, Huawei offers the OPS, an open platform with
programmability. The OPS allows users or third-party developers to develop and deploy
network management policies using open RESTful APIs. With the programmability, the OPS
implements rapid service expansion, automatic function deployment, intelligent device
management, helping reduce network operation and maintenance costs and simplify network
operations.
Benefits
The OPS offers the following benefits:
Supports user-defined configurations and applications, implementing flexible service
deployment and simplifying network device management.
Uses various third-party programs, improving network utilization.
Allows users to develop private services.
Facilitates application deployment.
Security
The OPS provides the following security measures:
API security
Only authorized users can operate the OPS.
Operation security
Resources are isolated in modules in the OPS and their usage can be monitored.
Program security
Third-party resources are used to manage programs.
Important information security
OPS APIs use the secure communication protocol to ensure no information leakage
during transmission. In addition, users must ensure local security for operating and
saving important information.
1.4.19.2 Principles
1.4.19.2.1 OPS Architecture
Figure 1-138 shows the OPS architecture. The OPS is developed based on the Huawei
proprietary Versatile Routing Platform (VRP) and allows customized applications to
interwork with the modules on the management, control, and resource planes on the VRP
through open application programming interfaces (APIs).
1.4.19.3 Application
1.4.19.3.1 Maintenance Assistant Applications
Figure 1-139 Device automatically collects health information and sends it to a TFTP/FTP server
Terms
Term Definition
OPS Open Programmability System. A system that
provides APIs for users or third-party developers
to program self-defined applications, which
facilitate service extension and automatic
management and maintenance and enable users
to improve the utilization of their network
resources.
API Application Programming Interface. An interface
that specifies how applications interact with each
other.
REST Representational State Transfer. A software
standard defined by Doctor Roy Fielding in his
doctoral dissertation titled Architectural Styles
and the Design of Network-based Software
Architectures in 2000. The REST standard
defines:
Addressability: Unlike OOP objects, each
resource in REST has its unique
corresponding URI.
Interface uniformity: Unlike SOAP, REST
requires that all request methods be mapped
to HTTP methods (GET, PUT, DELETE, and
POST). Therefore, no service description
languages, such as WSDL, are required.
Statelessness: No client-specific information
is ever stored on the server. This makes it
much easier to horizontally scale applications
and makes the server more reliable.
Representational: The customers
communicate with the representations of
resources instead of the resources. A resource
can have multiple representations. Any
customer that is allocated the representation
Term Definition
of a resource has sufficient information for
processing underlying resources.
Connectedness: Any REST-based system can
predict the resources that customers need to
access and return the representations carrying
these resources. For example, the system can
return the RESTful operations in a hyperlink
to customers.
1.4.20 SAID
1.4.20.1 Introduction
Definition
System of active immunization and diagnosis (SAID) is an intelligent fault diagnosis system
that automatically diagnoses and rectifies severe device or service faults by simulating human
operations in troubleshooting.
Purpose
A network is prone to severe problems if it fails to recover from a service interruption. At
present, device reliability is implemented through various detection functions. Once a device
fault occurs, the device reports an alarm or requires a reset for fault recovery. However, this
mechanism is intended for fault detection of a single module. When a service interruption
occurs, the network may fail to promptly recover from the fault, adversely affecting services.
In addition, after receiving a reported fault, maintenance engineers may face a difficulty in
collecting fault information, preventing problem locating and adversely affecting device
maintenance.
The SAID is promoted to address the preceding issues. The SAID achieves automated device
fault diagnosis, fault information collection, and service recovery, comprehensively
improving the self-healing capability and maintainability of devices.
Benefits
The SAID can automatically detect, diagnose, and rectify device faults, greatly improving
network maintainability and reducing maintenance costs.
1.4.20.2 Principles
1.4.20.2.1 Basic SAID Functions
Basic Concepts
SAID node: detects, diagnoses, and rectifies faults on a device's modules in the SAID.
SAID nodes are classified into the following types:
− Module-level SAID node: defends against, detects, diagnoses, and rectifies faults on
a module.
− SAID-level SAID node: detects, diagnoses, and rectifies faults on multiple modules.
SAID node state machine: state triggered when a SAID node detects, diagnoses, and
rectifies faults. A SAID node involves seven states: initial, detecting, diagnosing,
invalid-diagnose, recovering, judging, and service exception states.
SAID tracing: The SAID collects and stores information generated when a SAID node
detects, diagnoses, and rectifies faults. The information can be used to locate the root
cause of a fault.
SAID
Fault locating in the SAID involves the fault detection, diagnosis, and recovery phases. The
SAID has multiple SAID nodes. Each time valid diagnosis is triggered (that is, the recovery
process has been triggered), the SAID records the diagnosis process information for fault
tracing. The SAID's main processes are described as follows:
1. Defense startup phase: After the system runs, it instructs modules to deploy fault defense
(for example, periodic logic re-loading and entry synchronization), starting the entire
device's fault defense.
2. Detection phase: A SAID node detects faults and finds prerequisites for problem
occurrence. Fault detection is classified as periodic detection (for example, periodic
traffic decrease detection) or triggered detection (for example, IS-IS Down detection).
3. Diagnosis phase: Once a SAID node detects a fault, the SAID node diagnoses the fault
and collects various fault entries to locate fault causes (only causes based on which
recovery measures can be taken need to be located).
4. Recovery phase: After recording information, the SAID node starts to rectify the fault by
level. After the recovery action is completed at each level, the SAID node determines
whether services recover (by determining whether the fault symptom disappears). If the
fault persists, the SAID node continues to perform the recovery action at the next level
until the fault is rectified. The recovery action is gradually performed from a lightweight
level to a heavyweight level.
5. Tracing phase: If the SAID determines the fault and its cause, this fault diagnosis is a
valid diagnosis. The SAID then records the diagnosis process. After entering the
recovery phase, the SAID records the recovery process for subsequent analysis.
13. If the service does not recover in the judging state and a secondary recovery action exists,
the SAID node enters the recovering state.
14. If the service does not recover in the judging state and no secondary recovery action
exists, the SAID node enters the service exception state.
15. In the service exception state, the SAID node periodically checks whether the service
recovers.
16. If the service recovers in the judging state, the SAID node enters the initial state.
Fault Cause
The failure to ping through a directly connected device often occurs on the network, causing
services to be interrupted for a long time and fail to automatically recover. The ping process
involves various IP forwarding phases. A ping failure may be caused by a hardware entry error,
board fault, or subcard fault on the local device or a fault on an intermediate device or the
peer device. Therefore, it is difficult to locate or demarcate the specific fault.
Definition
The ping service node is a specific SAID service node. This node performs link-heartbeat
loopback detection to detect service faults, diagnoses each ping forwarding phase to locate or
demarcate the fault, and takes corresponding recovery actions.
Principles
For details about the SAID framework and principles, see 1.4.20 SAID. The ping service node
undergoes four phases (fault detection, fault diagnosis, fault recovery, and service recovery
determination) to implement automatic device diagnosis, fault information collection, and
service recovery.
Fault detection
The ping service node performs link-heartbeat loopback detection to detect service faults.
Link-heartbeat loopback detection is classified as packet modification detection or
packet loss detection.
− Packet modification detection is to check whether the content of received heartbeat
packets is the same as the content of sent heartbeat packets.
− Packet loss detection is to check whether the difference between the number of
received heartbeat packets and the number of sent heartbeat packets is within the
permitted range.
After detecting packet modification or loss, the SAID triggers a message and sends it to
instruct the ping service node to diagnose the fault.
Fault diagnosis
After receiving the triggered message in the fault detection state, the ping service node
enters the diagnosis state for fault diagnosis.
In the fault diagnosis state, the ping service node performs interface loopback detection
to determine whether the local device is faulty. If yes, the node enters the fault recovery
state. If not, the node generates an alarm (only a packet modification alarm) and returns
to the fault detection state.
Fault recovery
If a loopback detection fault occurs, the ping service node determines whether a counting
error occurs on the associated subcard.
− If a counting error occurs, the ping service node resets the subcard for service
recovery. Then, the node enters the service recovery determination state and
performs link-heartbeat loopback detection to determine whether services recover.
If services recover, the node returns to the fault detection state. If services do not
recover, the node returns to the fault recovery state and takes a secondary recovery
action. (For a subcard reset, the secondary recovery action is board reset.)
− If no counting error occurs, the ping service node resets the involved board for
service recovery. After the board starts, the node enters the service recovery
determination state and performs link-heartbeat loopback detection to determine
whether services recover. If services recover, the node returns to the fault detection
state. If services do not recover, the node remains in the service recovery
determination state and periodically performs link-heartbeat loopback detection
until services recover.
Service recovery determination
After fault recovery is complete, the ping service node uses the fault packet template to
send diagnostic packets. If a fault still exists, the node generates an alarm. If no fault
exists, the node instructs the link heartbeat to return to the initiate state, and the node
itself returns to the fault detection state.
Terms
None.
Abbreviation
Abbreviatio Full Spelling
n
SAID System of Active Immunization and Diagnosis
1.4.21 KPI
1.4.21.1 Introduction
Definition
Key performance indicators (KPIs) indicate the performance of a running device at a specific
time. A KPI may be obtained by aggregating multiple levels of KPIs. The KPI data collected
by the master MPU and LPUs is saved as a xxx.dat file and stored into the CF card on the
master MPU. The KPI parsing tool parses the file according to a predefined parsing format
and converts it into an Excel file. The Excel file provides relevant fault and service
impairment information, facilitating fault locating.
Purpose
The KPI system records key device KPIs in real time, provides service impairment
information (for example, the fault generation time, service impairment scope/type, relevant
operation, and possible fault cause/location), and supports fast fault locating.
Benefits
The KPI system helps carriers quickly learn service impairment information and locate faults,
so that they can effectively improve network maintainability and reduce maintenance costs.
1.4.21.2 Principles
KPI System
Key performance indicators (KPIs) are periodically collected at a specified time, which
slightly increases memory and CPU usage. However, if a large number of KPIs are to be
collected, services may be seriously affected. Therefore, when memory or CPU usage exceeds
70%, enable the system to collect the KPIs of only the CP-CAR traffic, message-queue
CurLen, Memory Usage, and CPU Usage objects that do not increase the memory or CPU
usage.
The KPI system checks whether the receiving buffer area has data every 30 minutes. If the
receiving buffer area has data, the system writes the data into a data file and checks whether
the data file size is greater than or equal to 4 MB. If the data file size is greater than or equal
to 4 GB, the system compresses the file as a package named in the
yyyy-mm-dd.hh-mm-ss.dat.zip format. After the compression is complete, the system
deletes the data file.
The KPI system obtains information about the size of the remaining CF card space each time
a file is generated.
If the remaining CF card space is less than or equal to 50 MB, the KPI system deletes the
oldest packages compressed from data files.
If the remaining CF card space is greater than 50 MB, the KPI system obtains data files
from the cfcard2:/KPISTAT path and computes the total space used by all the
packages compressed from data files. If the space usage is greater than or equal to 110
MB, the KPI system deletes the oldest packages.
1. The KPI system provides a registration mechanism for service modules. After the
modules register, the system collects service data at the specific collection time through
periodic collection and storage interfaces.
2. When the collection period of a service module expires, the KPI system invokes the
module to collect data. The module converts the collected data into a desired KPI packet
format and saves the data on the MPU through the interface provided by the KPI system.
3. The KPI parsing tool parses the file based on a predefined format and converts the file
into an Excel one.
KPI Categories
KPIs are categorized as access service, traffic monitoring, system, unexpected packet loss,
resource, or value-added service KPIs. The monitoring period can be 1, 5, 10, 15, or 30
minutes. At present, chips (for example, NP and TM chips), services (for example, QoS), and
boards (for example, MPUs, LPUs, and subcards) support KPI collection.
Table 1-35 provides KPI examples.
H 1. K 2 V800 2 0 1 C S C C 2 C T 3 A N 6 %
U 1. P 0 R009 0 P y P P 5 P o 0 l A
A 1. I 1 C10S 1 U st U U 0 U ta 0 w
W 1 L 7/ PC20 7 P e U 8 U l a
E O 4/ 0 - m s 8 s y
I G 2 0 a a s
7 4 g g
- e e
2
7
1
4
:
4
7
:
4
9
+
0
0
:
0
0
D L F C Versi D C S M K K K K K T I R T K U
e o il o on a h l o P P P P P y n e h P n
v o e ll t a o d I- I- I- I- I- p t c r I- it
ic p T e e s t u C S o I N e e o e V
e B y ct T s l l u b D a r r s a
N a p D i i e a b j m v d h l
a c e at m s s C e e a M o u
m k e e s l c l o l e
e I a t d d
P s e
s
H 1. K 2 V800 2 0 1 M S M M 2 M T 3 A N 1 %
U 1. P 0 R009 0 E y e e 5 e o 0 l A 6
A 1. I 1 C10S 1 M st m m 0 m ta 0 w
W 1 L 7/ PC20 7 P e o o 8 o l a
E O 4/ 0 - m r r 9 r y
I G 2 0 y y y s
7 4 U U
- s s
2 a a
7 g g
1 e e
4
:
4
8
:
4
9
+
0
0
:
0
0
H 1. K 2 V800 2 0 1 C S C C 2 C T 3 A N 6 %
U 1. P 0 R009 0 P y P P 5 P o 0 l A
A 1. I 1 C10S 1 U st U U 0 U ta 0 w
W 1 L 7/ PC20 7 P e U 8 U l a
E O 4/ 0 - m s 8 s y
I G 2 0 a a s
7 4 g g
- e e
2
7
1
4
:
4
9
:
4
9
D L F C Versi D C S M K K K K K T I R T K U
e o il o on a h l o P P P P P y n e h P n
v o e ll t a o d I- I- I- I- I- p t c r I- it
ic p T e e s t u C S o I N e e o e V
e B y ct T s l l u b D a r r s a
N a p D i i e a b j m v d h l
a c e at m s s C e e a M o u
m k e e s l c l o l e
e I a t d d
P s e
s
+
0
0
:
0
0
H 1. K 2 V800 2 0 1 M S M M 2 M T 3 A N 1 %
U 1. P 0 R009 0 E y e e 5 e o 0 l A 6
A 1. I 1 C10S 1 M st m m 0 m ta 0 w
W 1 L 7/ PC20 7 P e o o 8 o l a
E O 4/ 0 - m r r 9 r y
I G 2 0 y y y s
7 4 U U
- s s
2 a a
7 g g
1 e e
4
:
5
0
:
4
9
+
0
0
:
0
0
Related Version
The following table lists the product version related to this document.
Intended Audience
This document is intended for:
Network planning engineers
Commissioning engineers
Data configuration engineers
System maintenance engineers
Security Declaration
Encryption algorithm declaration
The encryption algorithms DES/3DES/SKIPJACK/RC2/RSA (RSA-1024 or
lower)/MD2/MD4/MD5 (in digital signature scenarios and password encryption)/SHA1
(in digital signature scenarios) have a low security, which may bring security risks. If
protocols allowed, using more secure encryption algorithms, such as AES/RSA
(RSA-2048 or higher)/SHA2/HMAC-SHA2 is recommended.
Password configuration declaration
− Do not set both the start and end characters of a password to "%^%#". This causes
the password to be displayed directly in the configuration file.
− To further improve device security, periodically change the password.
Personal data declaration
Your purchased products, services, or features may use users' some personal data during
service operation or fault locating. You must define user privacy policies in compliance
with local laws and take proper measures to fully protect personal data.
Feature declaration
− The NetStream feature may be used to analyze the communication information of
terminal customers for network traffic statistics and management purposes. Before
enabling the NetStream feature, ensure that it is performed within the boundaries
Special Declaration
This document serves only as a guide. The content is written based on device
information gathered under lab conditions. The content provided by this document is
intended to be taken as general guidance, and does not cover all scenarios. The content
provided by this document may be different from the information on user device
interfaces due to factors such as version upgrades and differences in device models,
board restrictions, and configuration files. The actual user device information takes
precedence over the content provided by this document. The preceding differences are
beyond the scope of this document.
The maximum values provided in this document are obtained in specific lab
environments (for example, only a certain type of board or protocol is configured on a
tested device). The actually obtained maximum values may be different from the
maximum values provided in this document due to factors such as differences in
hardware configurations and carried services.
Interface numbers used in this document are examples. Use the existing interface
numbers on devices for configuration.
The pictures of hardware in this document are for reference only.
Symbol Conventions
The symbols that may be found in this document are defined as follows.
Symbol Description
Indicates an imminently hazardous situation which, if not
avoided, will result in death or serious injury.
Symbol Description
Indicates a potentially hazardous situation which, if not
avoided, may result in minor or moderate injury.
Change History
Updates between document issues are cumulative. Therefore, the latest document issue
contains all updates made in previous issues.
Changes in Issue 03 (2017-09-20)
This issue is the third official release. The software version of this issue is
V800R009C10SPC200.
Changes in Issue 02 (2017-07-30)
This issue is the second official release. The software version of this issue is
V800R009C10SPC100.
Changes in Issue 01 (2017-05-30)
This issue is the first official release. The software version of this issue is
V800R009C10.
1.5.2 BFD
1.5.2.1 Introduction
Definition
Bidirectional Forwarding Detection (BFD) is a fault detection protocol that can quickly
determine a communication failure between devices and notify upper-layer applications.
Purpose
To minimize the impact of device faults on services and improve network availability, a
network device must be able to quickly detect faults in communication with adjacent devices.
Measures can then be taken to promptly rectify the faults to ensure service continuity.
On a live network, link faults can be detected using either of the following mechanisms:
Hardware detection: For example, the Synchronous Digital Hierarchy (SDH) alarm
function can be used to quickly detect link faults.
Hello detection: If hardware detection is unavailable, Hello detection can be used to
detect link faults.
However, the two mechanisms have the following issues:
Only certain media support hardware detection.
Hello detection takes more than 1 second to detect a fault. When traffic is transmitted at
gigabit rates, such slow detection causes packet loss.
On a Layer 3 network, the Hello packet detection mechanism cannot detect faults for all
routes, such as static routes.
BFD resolves these issues by providing:
A low-overhead, short-duration method to detect faults on the path between adjacent
forwarding engines. The faults can be interface, data link, and even forwarding engine
faults.
A single, unified mechanism to monitor any media and protocol layers in real time.
Benefits
BFD offers the following benefits:
Improved network performance and reliability
Improved user experience
1.5.2.2 Principles
1.5.2.2.1 Basic Concepts
Bidirectional Forwarding Detection (BFD) detects communication faults between forwarding
engines. Specifically, BFD checks the continuity of a data protocol on the path between
systems. The path can be a physical or logical link or a tunnel.
BFD interacts with upper-layer applications in the following manner:
An upper-layer application provides BFD with parameters, such as the detection address
and time.
BFD creates, deletes, or modifies sessions based on these parameters and notifies the
upper-layer application of the session status.
BFD has the following characteristics:
Provides a low-overhead, short-duration method to detect faults on the path between
adjacent forwarding engines.
Provides a single, unified mechanism to monitor any media and protocol layers in real
time.
The following sections describe the basic principles of BFD, including the BFD detection
mechanism, detected link types, session establishment modes, and session management.
Static mode BFD session parameters, such as the local and remote
discriminators, are manually configured and delivered for BFD
session establishment.
NOTE
In static mode, configure unique local and remote discriminators for each
BFD session. This mode prevents incorrect discriminators from affecting
BFD sessions that have correct discriminators and prevents BFD sessions
from alternating between Up and Down.
Figure 1-142 shows the status change process of the state machine during the establishment of
a BFD session.
1. BFD configured on both Device A and Device B independently starts state machines.
The initial status of BFD state machines is Down. Device A and Device B send BFD
control packets with the State field set to Down. If BFD sessions are established in static
mode, the value of Your Discriminator in BFD control packets is manually specified. If
BFD sessions are established in dynamic mode, the value of Your Discriminator is set to
0.
2. After receiving a BFD control packet with the State field set to Down, Device B switches
the session status to Init and sends a BFD control packet with the State field set to Init.
After the local BFD session status of Device B changes to Init, Device B no longer processes the
received BFD control packets with the State field set to Down.
3. The BFD session status change of Device A is the same as that of Device B.
4. After receiving a BFD control packet with the State field set to Init, Device B changes
the local session status to Up.
5. The BFD session status change of Device A is the same as that of Device B.
Typical application 2:
As shown in Figure 1-144, BFD monitors the multi-hop IPv4 path between Device A and
Device C, and BFD sessions are bound only to peer IP addresses.
Typical application 4:
As shown in Figure 1-146, BFD monitors the multi-hop IPv6 path between Device A and
Device C, and BFD sessions are bound only to peer IP addresses.
In BFD for IP scenarios, BFD for PST is configured on a device. If a link fault occurs, BFD
detects the fault and triggers the PST to go Down. If the device restarts and the link fault
persists, BFD is in the AdminDown state and does not notify the PST of BFD Down. As a
result, the PST is not triggered to go Down and the interface bound to BFD is still Up.
to the BFD module. In this manner, the BFD module detects that the link is normal. If
multicast BFD packets are sent over a trunk member link, they are delivered to the data link
layer for link continuity check. The remote IP address used in a multicast BFD session is the
default known multicast IP address (224.0.0.107 to 224.0.0.250). Any packet with the default
known multicast IP address is sent to the BFD module for IP forwarding.
Usage Scenario
As shown in Figure 1-147, multicast BFD is configured on both Device A and Device B. BFD
sessions are bound to the outbound interface If1, and the default multicast address is used.
After the configuration is complete, multicast BFD quickly checks the continuity of the link
between interfaces.
Usage Scenario
In Figure 1-148, a BFD session is established between Device A and Device B, and the
default multicast address is used to check the continuity of the single-hop link connected to
the interface If1. After BFD for PIS is configured and BFD detects a link fault, BFD
immediately sends a message indicating the Down state to the associated interface. The
interface then enters the BFD Down state.
On the network shown in Figure 1-149, a BFD for link-bundle session consists of one main
session and multiple sub-sessions.
Each sub-session independently monitors an Eth-Trunk member interface and reports the
monitoring results to the main session. Each sub-session uses the same monitoring
parameters as the main session.
The main session creates a BFD sub-session for each Eth-Trunk member interface,
summarizes the sub-session monitoring results, and determines the status of the
Eth-Trunk.
− The main session is Up so long as a sub-session is Up.
− If no sub-session is available, the main session goes Down and the Unknown state
is reported to applications. The status of the Eth-Trunk port is not changed.
− If the Eth-Trunk has only one member interface and the corresponding sub-session
is Up, the main session goes Down when the member interface exits the Eth-Trunk.
The status of the Eth-Trunk is Up.
The main session's local discriminator is allocated from the range from 0x00100000 to
0x00103fff without occupying the original BFD session discriminator range. The main
session does not learn the remote discriminator because it does not send or receive packets. A
sub-session's local discriminator is allocated from the original dynamic BFD session
discriminator range using the same algorithm as a dynamic BFD session.
Only sub-sessions consume BFD session resources per board. A sub-session must select the
board on which the physical member interface bound to this sub-session resides as a state
machine board. If no BFD session resources are available on the board, board selection fails.
In this situation, the sub-session's status is not used to determine the main session's status.
The process of establishing a passive BFD echo session as shown in Figure 1-150 is as
follows:
1. Device B functions as a BFD session initiator and sends an asynchronous BFD packet to
Device A. The Required Min Echo RX Interval field carried in the packet is a nonzero
value, which specifies that Device A must support BFD echo.
2. After receiving the packet, Device A finds that the value of the Required Min Echo RX
Interval field carried in the packet is a nonzero value. If Device A has passive BFD echo
enabled, it checks whether any ACL that restricts passive BFD echo is referenced. If an
ACL is referenced, only BFD sessions that match specific ACL rules can enter the
asynchronous echo mode. If no ACL is referenced, BFD sessions immediately enter the
asynchronous echo mode.
3. Device B periodically sends BFD echo packets, and Device A sends BFD echo packets
(the source and destination IP addresses are the local IP address, and the destination
physical address is Device B's physical address) at the interval specified by the Required
Min RX Interval field. Both Device A and Device B start a receive timer, with a receive
interval that is the same as the interval at which they each send BFD echo packets.
4. After Device A and Device B receive BFD echo packets from each other, they
immediately loop back the packets at the forwarding layer. Device A and Device B also
send asynchronous BFD packets to each other at an interval that is much less than that
for sending echo packets.
Similarities and Differences Between Passive BFD Echo and One-Arm BFD Echo
To ensure that passive BFD echo or one-arm BFD echo can take effect, disable strict URPF
on devices that send BFD echo packets.
Strict URPF prevents attacks that use spoofed source IP addresses. If strict URPF is enabled
on a device, the device obtains the source IP address and inbound interface of a packet and
searches the forwarding table for an entry with the destination IP address set to the source IP
address of the packet. The device then checks whether the outbound interface for the entry
matches the inbound interface. If they do not match, the device considers the source IP
address invalid and discards the packet. After a device enabled with strict URPF receives a
BFD echo packet that is looped back, it checks the source IP address of the packet. As the
source IP address of the echo packet is a local IP address of the device, the packet is sent to
the platform without being forwarded at the lower layer. As a result, the device considers the
packet invalid and discards it.
Table 1-42 Differences between BFD echo sessions and common static single-hop sessions
BFD Suppor Session Descripto Negotiation IP Header
Session ted IP Type r Prerequisite
Type
Common IPv4 Static MD and A matching The source and
static and single-ho YD must session must be destination IP
single-ho IPv6 p session be established on addresses are
p session configured the peer. different.
.
Passive IPv4 Dynamic No MD or A matching Both the source and
BFD and single-ho YD needs session must be destination IP
echo IPv6 p session to be established and addresses are a local
session configured echo must be IP address of the
. enabled on the device.
peer.
One-arm IPv4 Static Only MD A matching Both the source and
BFD single-ho needs to session does not destination IP
echo p session be need to be addresses are a local
session configured established on IP address of the
(MD and the peer. device.
YD are the
same).
− If a single-hop BFD session is established and the session is bound to a board that is
BFD-incapable in hardware but BFD-capable in software, the BFD session can be
processed by this board.
Integrated mode
If single-hop BFD sessions are established and the sessions are bound to boards that are
BFD-incapable in hardware but BFD-capable in software, the sessions will be distributed
to the two load-balancing integrated boards. The load-balancing integrated board with
more available BFD resources will be preferentially selected.
Boards that are BFD-incapable in hardware but BFD-capable in software are selected in the following
conditions:
If boards that are BFD-incapable in hardware but BFD-capable in software are already selected and the
integrated mode is configured, sessions will enter the AdminDown state and then be bound to an
integrated board.
Table 1-43 describes the board selection rules for BFD sessions.
the event. As a result, the upper-layer protocol frequently flaps. BFD dampening prevents link
flapping detected by BFD from causing the frequent flapping of the upper-layer protocol.
BFD dampening enables the BFD session's next negotiation to be delayed if the number of
times that a BFD session flaps reaches a threshold. However, IGP and MPLS negotiation is
not affected. Specifically, if a BFD session that is always flapping goes Down, its next
negotiation is delayed, reducing the number of times that the BFD session flaps.
1.5.2.3 Applications
1.5.2.3.1 BFD for Static Routes
Different from dynamic routing protocols, static routes do not have a detection mechanism. If
a fault occurs on a network, an administrator must manually address it. Bidirectional
Forwarding Detection (BFD) for static routes is introduced to associate a static route with a
BFD session so that the BFD session can detect the status of the link that the static route
passes through.
After BFD for static routes is configured, each static route can be associated with a BFD
session. In addition to route selection rules, whether a static route can be selected as the
optimal route is subject to BFD session status.
If a BFD session associated with a static route detects a link failure when the BFD
session is Down, the BFD session reports the link failure to the system. The system then
deletes the static route from the IP routing table.
If a BFD session associated with a static route detects that a faulty link recovers when
the BFD session is Up, the BFD session reports the fault recovery to the system. The
system then adds the static route to the IP routing table again.
By default, a static route can still be selected even though the BFD session associated
with it is AdminDown (triggered by the shutdown command run either locally or
remotely). If a device is restarted, the BFD session needs to be re-negotiated. In this case,
whether the static route associated with the BFD session can be selected as the optimal
route is subject to the re-negotiated BFD session status.
BFD for static routes has two detection modes:
Single-hop detection
In single-hop detection mode, the configured outbound interface and next hop address
are the information about the directly connected next hop. The outbound interface
associated with the BFD session is the outbound interface of the static route, and the peer
address is the next hop address of the static route.
Multi-hop detection
In multi-hop detection mode, only the next hop address is configured. Therefore, the
static route must be iterated to the directly connected next hop and outbound interface.
The peer address of the BFD session is the original next hop address of the static route,
and the outbound interface is not specified. In most cases, the original next hop to be
iterated is an indirect next hop. Multi-hop detection is performed on the static routes that
support route iteration.
For details about BFD, see the HUAWEI NE20E-S2 Universal Service Router Feature Description -
Reliability.
Background
Routing Information Protocol (RIP)-capable devices monitor the neighbor status by
exchanging Update packets periodically. During the period local devices detect link failures,
carriers or users may lose a large number of packets. Bidirectional forwarding detection (BFD)
for RIP can speed up fault detection and route convergence, which improves network
reliability.
After BFD for RIP is configured on the router, BFD can detect a fault (if any) within
milliseconds and notify the RIP module of the fault. The router then deletes the route that
passes through the faulty link and switches traffic to a backup link. This process speeds up
RIP convergence.
Table 1-44 describes the differences before and after BFD for RIP is configured.
Table 1-44 Differences before and after BFD for RIP is configured
Related Concepts
The BFD mechanism bidirectionally monitors data protocol connectivity over the link
between two routers. After BFD is associated with a routing protocol, BFD can rapidly detect
a fault (if any) and notify the protocol module of the fault, which speeds up route convergence
and minimizes traffic loss.
BFD is classified into the following modes:
Static BFD
In static BFD mode, BFD session parameters (including local and remote discriminators)
must be configured, and requests must be delivered manually to establish BFD sessions.
Static BFD is applicable to networks on which only a few links require high reliability.
Dynamic BFD
In dynamic BFD mode, the establishment of BFD sessions is triggered by routing
protocols, and the local discriminator is dynamically allocated, while the remote
discriminator is obtained from BFD packets sent by the neighbor.
When a new neighbor relationship is set up, a BFD session is established based on the
neighbor and detection parameters, including source and destination IP addresses. When
a fault occurs on the link, the routing protocol associated with BFD can detect the BFD
session Down event. Traffic is switched to the backup link immediately, which
minimizes data loss.
Dynamic BFD is applicable to networks that require high reliability.
Implementation
For details about BFD implementation, see "BFD" in Universal Service Router Feature
Description - Reliability. Figure 1-151 shows a typical network topology for BFD for RIP.
Dynamic BFD for RIP implementation:
a. RIP neighbor relationships are established among Device A, Device B, and Device
C and between Device B and Device D.
b. BFD for RIP is enabled on Device A and Device B.
c. Device A calculates routes, and the next hop along the route from Device A to
Device D is Device B.
d. If a fault occurs on the link between Device A and Device B, BFD will rapidly
detect the fault and report it to Device A. Device A then deletes the route whose
next hop is Device B from the routing table.
e. Device A recalculates routes and selects a new path Device C → Device B →
Device D.
f. After the link between Device A and Device B recovers, a new BFD session is
established between the two routers. Device A then reselects an optimal link to
forward packets.
Static BFD for RIP implementation:
a. RIP neighbor relationships are established among Device A, Device B, and Device
C and between Device B and Device D.
b. Static BFD is configured on the interface that connects Device A to Device B.
c. If a fault occurs on the link between Device A and Device B, BFD will rapidly
detect the fault and report it to Device A. Device A then deletes the route whose
next hop is Device B from the routing table.
d. After the link between Device A and Device B recovers, a new BFD session is
established between the two routers. Device A then reselects an optimal link to
forward packets.
Usage Scenario
BFD for RIP is applicable to networks that require high reliability.
Benefits
BFD for RIP improves network reliability and enables devices to rapidly detect link faults,
which speeds up route convergence on RIP networks.
Definition
Bidirectional Forwarding Detection (BFD) is a mechanism to detect communication faults
between forwarding engines.
To be specific, BFD detects the connectivity of a data protocol along a path between two
systems. The path can be a physical link, a logical link, or a tunnel.
In BFD for OSPF, a BFD session is associated with OSPF. The BFD session quickly detects a
link fault and then notifies OSPF of the fault, which speeds up OSPF's response to network
topology changes.
Purpose
A link fault or a topology change causes routers to recalculate routes. Routing protocol
convergence must be as quick as possible to improve network availability. Link faults are
inevitable, and therefore a solution must be provided to quickly detect faults and notify
routing protocols.
BFD for Open Shortest Path First (OSPF) associates BFD sessions with OSPF. After BFD for
OSPF is configured, BFD quickly detects link faults and notifies OSPF of the faults. BFD for
OSPF accelerates OSPF response to network topology changes.
Table 1-45 describes OSPF convergence speeds before and after BFD for OSPF is configured.
Table 1-45 OSPF convergence speeds before and after BFD for OSPF is configured
Principles
Figure 1-152 shows a typical network topology with BFD for OSPF configured. The
principles of BFD for OSPF are described as follows:
1. OSPF neighbor relationships are established between these three routers.
2. After a neighbor relationship becomes Full, a BFD session is established.
3. The outbound interface on Device A connected to Device B is interface 1. If the link
between Device A and Device B fails, BFD detects the fault and then notifies Device A
of the fault.
4. Device A processes the event that a neighbor relationship has become Down and
recalculates routes. The new route passes through Device C and reaches Device A, with
interface 2 as the outbound interface.
Definition
Bidirectional Forwarding Detection (BFD) is a mechanism to detect communication faults
between forwarding engines.
To be specific, BFD detects the connectivity of a data protocol along a path between two
systems. The path can be a physical link, a logical link, or a tunnel.
In BFD for OSPFv3, a BFD session is associated with OSPFv3. The BFD session quickly
detects a link fault and then notifies OSPFv3 of the fault, which speeds up OSPFv3's response
to network topology changes.
Purpose
A link fault or a topology change causes routers to recalculate routes. Routing protocol
convergence must be as quick as possible to improve network availability. Link faults are
inevitable, and therefore a solution must be provided to quickly detect faults and notify
routing protocols.
BFD for Open Shortest Path First version 3 (OSPFv3) associates BFD sessions with OSPFv3.
After BFD for OSPFv3 is configured, BFD quickly detects link faults and notifies OSPFv3 of
the faults. BFD for OSPFv3 accelerates OSPFv3 response to network topology changes.
Table 1-46 describes OSPFv3 convergence speeds before and after BFD for OSPFv3 is
configured.
Table 1-46 OSPFv3 convergence speeds before and after BFD for OSPFv3 is configured
Principles
Figure 1-153 shows a typical network topology with BFD for OSPFv3 configured. The
principles of BFD for OSPFv3 are described as follows:
1. OSPFv3 neighbor relationships are established between these three routers.
2. After a neighbor relationship becomes Full, a BFD session is established.
3. The outbound interface on Device A connected to Device B is interface 1. If the link
between Device A and Device B fails, BFD detects the fault and then notifies Device A
of the fault.
4. Device A processes the event that a neighbor relationship has become Down and
recalculates routes. The new route passes through Device C and reaches Device B, with
interface 2 as the outbound interface.
A device can detect neighbor faults at the second level only. As a result, link faults on a
high-speed network may cause a large number of packets to be discarded.
BFD, which can be used to detect link faults on lightly loaded networks at the millisecond
level, is introduced to resolve the preceding issue. With BFD, two systems periodically send
BFD packets to each other. If a system does not receive BFD packets from the other end
within a specified period, the system considers the bidirectional link between them Down.
BFD is classified into the following modes:
Static BFD
In static BFD mode, BFD session parameters (including local and remote discriminators)
are set using commands, and requests must be delivered manually to establish BFD
sessions.
Dynamic BFD
In dynamic BFD mode, the establishment of BFD sessions is triggered by routing
protocols.
BFD for IS-IS enables BFD sessions to be dynamically established. After detecting a fault,
BFD notifies IS-IS of the fault. IS-IS sets the neighbor status to Down, quickly updates link
state protocol data units (LSPs), and performs the partial route calculation (PRC). BFD for
IS-IS implements fast IS-IS route convergence.
Instead of replacing the Hello mechanism of IS-IS, BFD works with IS-IS to rapidly detect the faults
that occur on neighboring devices or links.
If a Level-1-2 neighbor relationship is set up between the devices on both ends of a link,
the following situations occur:
− On a broadcast network, IS-IS sets up a Level-1 BFD session and a Level-2 BFD
session.
− On a P2P network, IS-IS sets up only one BFD session.
Process of tearing down a BFD session
− P2P network
If the neighbor relationship established between P2P IS-IS interfaces is not Up,
IS-IS tears down the BFD session.
− Broadcast network
If the neighbor relationship established between broadcast IS-IS interfaces is not Up
or the DIS is reelected on the broadcast network, IS-IS tears down the BFD session.
If the configurations of dynamic BFD sessions are deleted or BFD for IS-IS is disabled
from an interface, all Up BFD sessions established between the interface and its
neighbors are deleted. If the interface is a DIS and the DIS is Up, all BFD sessions
established between the interface and its neighbors are deleted.
If BFD is disabled from an IS-IS process, BFD sessions are deleted from the process.
BFD detects only the one-hop link between IS-IS neighbors because IS-IS establishes only one-hop
neighbor relationships.
Usage Scenario
Dynamic BFD needs to be configured based on the actual network. If the time parameters are
not configured correctly, network flapping may occur.
BFD for IS-IS speeds up route convergence through rapid link failure detection. The
following is a networking example for BFD for IS-IS.
Networking
As shown in Figure 1-155, Device A and Device B belong to ASs 100 and 200, respectively.
The two routers are directly connected and establish an External Border Gateway Protocol
(EBGP) peer relationship.
BFD is enabled to detect the EBGP peer relationship between Device A and Device B. If the
link between Device A and Device B fails, BFD can quickly detect the fault and notify BGP.
Background
If a node or link along an LDP LSP that is transmitting traffic fails, traffic switches to a
backup LSP. The path switchover speed depends on the detection duration and traffic
switchover duration. A delayed path switchover causes traffic loss. LDP fast reroute (FRR)
can be used to speed up the traffic switchover, but not the detection process.
As shown in Figure 1-156, a local label switching router (LSR) periodically sends Hello
messages to notify each peer LSR of the local LSR's presence and establish a Hello adjacency
with each peer LSR. The local LSR constructs a Hello hold timer to maintain the Hello
adjacency with each peer. Each time the local LSR receives a Hello message, it updates the
Hello hold timer. If the Hello hold timer expires before a Hello message arrives, the LSR
considers the Hello adjacency disconnected. The Hello mechanism cannot rapidly detect link
faults, especially when a Layer 2 device is deployed between the local LSR and its peer.
The rapid, light-load BFD mechanism is used to quickly detect faults and trigger a
primary/backup LSP switchover, which minimizes data loss and improves service reliability.
BFD for LDP LSP is implemented by establishing a BFD session between two nodes on both
ends of an LSP and binding the session to the LSP. BFD rapidly detects LSP faults and
triggers a traffic switchover. When BFD monitors a unidirectional LDP LSP, the reverse path
of the LDP LSP can be an IP link, an LDP LSP, or a traffic engineering (TE) tunnel.
A BFD session that monitors LDP LSPs is negotiated in either static or dynamic mode:
Static configuration: The negotiation of a BFD session is performed using the local and
remote discriminators that are manually configured for the BFD session to be established.
On a local LSR, you can bind an LSP with a specified next-hop IP address to a BFD
session with a specified peer IP address.
Dynamic establishment: The negotiation of a BFD session is performed using the BFD
discriminator type-length-value (TLV) in an LSP ping packet. You must specify a policy
for establishing BFD sessions on a local LSR. The LSR automatically establishes BFD
sessions with its peers and binds the BFD sessions to LSPs using either of the following
policies:
− Host address-based policy: The local LSR uses all host addresses to establish BFD
sessions. You can specify a next-hop IP address and an outbound interface name of
LSPs and establish BFD sessions to monitor the specified LSPs.
− Forwarding equivalence class (FEC)-based policy: The local LSR uses host
addresses listed in a configured FEC list to automatically establish BFD sessions.
BFD uses the asynchronous mode to check LSP continuity. That is, the ingress and egress
periodically send BFD packets to each other. If one end does not receive BFD packets from
the other end within a detection period, BFD considers the LSP Down and sends an LSP
Down message to the LSP management (LSPM) module.
Although BFD for LDP is enabled on a proxy egress, a BFD session cannot be established for the
reverse path of a proxy egress LSP on the proxy egress.
Usage Scenarios
BFD for LDP LSP can be used in the following scenarios:
Primary and bypass LDP FRR LSPs are established.
Primary and bypass virtual private network (VPN) FRR LSPs are established.
Benefits
BFD for LDP LSP provides a rapid, light-load fault detection mechanism for LDP LSPs,
which improves network reliability.
Benefits
No tunnel protection is provided in the NG-MVPN over P2MP TE function or VPLS over
P2MP TE function. If a tunnel fails, traffic can only be switched using route change-induced
hard convergence, which renders low performance. This function provides dual-root 1+1
protection for the NG-MVPN over P2MP TE function and VPLS over P2MP TE function. If a
P2MP TE tunnel fails, BFD for P2MP TE rapidly detects the fault and switches traffic, which
improves fault convergence performance and reduces traffic loss.
Principles
In Figure 1-157, BFD is enabled on the root PE1 and the backup root PE2. Leaf nodes UPE1
to UEP4 are enabled to passively create BFD sessions. Both PE1 and PE2 sends BFD packets
to all leaf nodes along P2MP TE tunnels. The leaf nodes receives the BFD packets transmitted
only on the primary tunnel. If a leaf node receives detection packets within a specified
interval, the link between the root node and leaf node is working properly. If a leaf node fails
to receive BFD packets within a specified interval, the link between the root node and leaf
node fails. The leaf node then rapidly switches traffic to a protection tunnel, which reduces
traffic loss.
On the network shown in Figure 1-158, BFD is disabled. If LSRE fails, LSRA or LSRF
cannot promptly detect the fault because a Layer 2 switch exists between them. Although the
Hello mechanism detects the fault, detection lasts for a long time.
If LSRE fails, LSRA and LSRF detect the fault rapidly, and traffic switches to the path LSRA
-> LSRB -> LSRD -> LSRF.
BFD for TE detects faults in a CR-LSP. After detecting a fault in a CR-LSP, BFD for TE
immediately notifies the forwarding plane of the fault to rapidly trigger a traffic switchover.
BFD for TE is usually used together with a hot-standby CR-LSP.
The concepts associated with BFD are as follows:
Static BFD session: established by manually setting the local and remote discriminators.
The local discriminator on a local node must match the remote discriminator on a remote
node. The minimum intervals at which BFD packets are sent and received are
changeable after a static BFD session is established.
Dynamic BFD session: established without a local or remote discriminator specified.
After a routing protocol neighbor is established between the local and remote nodes, the
RM delivers parameters to instruct the BFD module to establish a BFD session. The two
nodes negotiate the local discriminator, remote discriminator, minimum interval at which
BFD packets are sent, and minimum interval at which BFD packets are received.
Detection period: an interval at which the system checks the BFD session status. If no
packet is received from the remote end within a detection period, the BFD session is
considered Down.
A BFD session is bound to a CR-LSP. A BFD session is set up between the ingress and egress.
A BFD packet is sent by the ingress to the egress along a CR-LSP. Upon receipt, the egress
responds to the BFD packet. The ingress can rapidly monitor the status of links through which
the CR-LSP passes based on whether a reply packet is received.
If a link fault is detected, BFD notifies the forwarding module of the fault. The forwarding
module searches for a backup CR-LSP and switches traffic to the backup CR-LSP. In addition,
the forwarding module reports the fault to the control plane. If dynamic BFD for TE CR-LSP
is used, the control plane proactively creates a BFD session to detect faults in the backup
CR-LSP. If static BFD for TE CR-LSP is used, a BFD session is created manually to detect
faults in the backup CR-LSP if necessary.
On the network shown in Figure 1-159, a BFD session is set up to detect faults in the link
through which the primary CR-LSP passes. If a link fault occurs, the BFD session on the
ingress immediately notifies the forwarding plane of the fault. The ingress switches traffic to
the bypass CR-LSP and sets up a new BFD session to detect faults in the bypass CR-LSP.
On the network shown in Figure 1-160, a primary CR-LSP is established along the path LSRA
-> LSRB, and a hot-standby CR-LSP is configured. A BFD session is set up between LSRA
and LSRB to detect faults in the primary CR-LSP. If a fault occurs on the primary CR-LSP,
the BFD session rapidly notifies LSRA of the fault. After receiving the fault information,
LSRA rapidly switches traffic to the hot-standby CR-LSP to ensure traffic continuity.
Background
When a Layer 2 device is deployed on a link between two RSVP nodes, an RSVP node can
only use the Hello mechanism to detect a link fault. For example, on the network shown in
Figure 1-161, a switch exists between P1 and P2. If a fault occurs on the link between the
switch and P2, P1 keeps sending Hello packets and detects the fault after it fails to receive
replies to the Hello packets. The fault detection latency causes seconds of traffic loss. To
minimize packet loss, BFD for RSVP can be configured. BFD rapidly detects a fault and
triggers TE FRR switching, which improves network reliability.
Implementation
BFD for RSVP monitors RSVP neighbor relationships.
Unlike BFD for CR-LSP and BFD for TE that support multi-hop BFD sessions, BFD for
RSVP establishes only single-hop BFD sessions between RSVP nodes to monitor the network
layer.
BFD for RSVP, BFD for OSPF, BFD for IS-IS, and BFD for BGP can share a BFD session.
When protocol-specific BFD parameters are set for a BFD session shared by RSVP and other
protocols, the smallest values take effect. The parameters include the minimum intervals at
which BFD packets are sent, minimum intervals at which BFD packets are received, and local
detection multipliers.
Usage Scenario
BFD for RSVP applies to a network on which a Layer 2 device exists between the TE FRR
point of local repair (PLR) on a bypass CR-LSP and an RSVP node on the primary CR-LSP.
Benefits
BFD for RSVP improves reliability on MPLS TE networks with Layer 2 devices.
Background
Devices in a VRRP backup group exchange VRRP Advertisement packets to negotiate the
master/backup status and implement backup. If the link between devices in a VRRP backup
group fails, VRRP Advertisement packets cannot be exchanged to negotiate the
master/backup status. A backup device attempts to preempt the Master state after a period
three times as long as the time interval at which VRRP Advertisement packets are broadcast.
During this period, user traffic is still forwarded to the master device, which results in user
traffic loss.
Bidirectional Forwarding Detection (BFD) can rapidly detect faults in links or IP routes. BFD
for VRRP enables a master/backup VRRP switchover to be completed within 1 second,
preventing user traffic loss. A BFD session is established between the master and backup
devices in a VRRP backup group and is bound to the VRRP backup group. BFD immediately
detects communication faults in the VRRP backup group and instructs the VRRP backup
group to perform a master/backup switchover, minimizing service interruptions.
As shown in Figure 1-162, a BFD session is established between Device A (master) and
Device B (backup) and is bound to a VRRP backup group. If BFD detects a fault on the link
between Device B and Device A, the BFD module notifies the VRRP module of the status
change. After receiving the notification, the VRRP module performs a master/backup VRRP
switchover.
Figure 1-162 Association between a VRRP backup group and a common BFD session
4. Device A in the Master state forwards user traffic, and Device B remains in the Backup
state.
The preceding process shows that BFD for VRRP is different from VRRP. After BFD for
VRRP is deployed and a fault occurs, a backup device immediately preempts the Master state
without waiting a period three times as long as the time interval at which VRRP
Advertisement packets are broadcast. A master/backup VRRP switchover can be implemented
in milliseconds.
Association Between a VRRP Backup Group and Link and Peer BFD Sessions
As shown in Figure 1-163, the master and backup devices monitor the status of link and peer
BFD sessions to identify local or remote faults.
Device A and Device B run VRRP. A peer BFD session is established between Device A and
Device B to detect link and device failures. Link BFD sessions are established between
Device A and Device E and between Device B and Device E to detect link and device failures.
After Device B detects that the peer BFD session goes Down and Link2 BFD session goes Up,
Device B's VRRP status changes from Backup to Master, and Device B takes over.
Figure 1-163 Association between a VRRP backup group and link and peer BFD sessions
− Link1 or Device E fails. Link1 BFD session and the peer BFD session go Down.
Link2 BFD session is Up.
Device A's VRRP status directly becomes Initialize.
Device B's VRRP status directly becomes Master.
− Device A fails. Link1 BFD session and the peer BFD session go Down. Link2 BFD
session is Up. Device B's VRRP status becomes Master.
3. After the fault is rectified, the BFD sessions go Up, and Device A and Device B restore
their VRRP status.
A Link2 fault does not affect Device A's VRRP status, and Device A continues to forward upstream
traffic. However, Device B's VRRP status becomes Master if both the peer BFD session and Link2 BFD
session go Down, and Device B detects the peer BFD session status change before detecting the Link2
BFD session status change. After Device B detects the Link2 BFD session status change, Device B's
VRRP status becomes Initialize.
Figure 1-164 shows the state machine for the association between a VRRP backup group and
link and peer BFD sessions.
Figure 1-164 State machine for the association between a VRRP backup group and link and peer
BFD sessions
The preceding process shows that, after link and peer BFD for VRRP is deployed, the backup
device immediately preempts the Master state if a fault occurs. Link and peer BFD for VRRP
implements a millisecond-level master/backup VRRP switchover.
Benefits
BFD for VRRP speeds up master/backup VRRP switchovers if faults occur.
Service Overview
Bidirectional Forwarding Detection (BFD) for pseudo wire (PW) monitors PW connectivity
on a Layer 2 virtual private network (L2VPN) and informs the L2VPN of any detected faults.
Upon receiving a fault notification from BFD, the L2VPN performs a primary/secondary PW
switchover to protect services.
Static BFD for PW has two modes: time to live (TTL) and non-TTL.
The two static BFD for PW modes are described as follows:
Static BFD for PW in TTL mode: The TTL of BFD packets is automatically calculated or
manually configured. BFD packets are encapsulated with PW labels and transmitted over
PWs. A PW can either have the control word enabled or not. The usage scenarios of
static BFD for PW in TTL mode are as follows:
− Static BFD for single-segment PW (SS-PW): Two BFD-enabled nodes negotiate a
BFD session based on the configured peer address and TTL (the TTL for SS-PWs is
1) and exchange BFD packets to monitor PW connectivity.
− Static BFD for multi-segment PW (MS-PW): The remote peer address of the
MS-PW to be detected must be specified. BFD packets can pass through multiple
superstratum provider edge devices (SPEs) to reach the destination, regardless of
whether the control word is enabled for the PW.
Static BFD for PW in non-TTL mode: The TTL of BFD packets is fixed at 255. BFD
packets are encapsulated with PW labels and transmitted over PWs. A PW must have the
control word enabled and differentiate control packets from data packets by checking
whether these packets carry the control word. Static BFD for PW in non-TTL mode can
detect only end-to-end (E2E) SS-PWs.
Networking Description
Figure 1-165 shows an IP radio access network (RAN) that consists of the following device
roles:
Cell site gateway (CSG): CSGs form the access network. On the IP RAN, CSGs function
as user-end provider edge devices (UPEs) to provide access services for NodeBs.
Aggregation site gateway (ASG): On the IP RAN, ASGs function as SPEs to provide
access services for UPEs.
Radio service gateway (RSG): ASGs and RSGs form the aggregation network. On the IP
RAN, RSGs function as network provider edge devices (NPEs) to connect to the radio
network controller (RNC).
The primary PW is along CSG1–ASG3–RSG5 and the secondary PW is along
CSG1–CSG2–ASG4-RSG6. If the primary PW fails, traffic switches to the secondary PW.
Feature Deployment
Configure static BFD for PW on the IP RAN as follows:
1. On CSG1, configure static BFD for the primary and secondary PWs.
2. On RSG5, configure static BFD for the primary PW.
3. On RSG6, configure static BFD for the secondary PW.
When you configure static BFD for PW, note the following points:
When you configure static BFD for the primary PW, ensure that the local discriminator on CSG1 is
the remote discriminator on RSG5 and that the remote discriminator on CSG1 is the local
discriminator on RSG5.
When you configure static BFD for the secondary PW, ensure that the local discriminator on CSG1
is the remote discriminator on RSG6 and that the remote discriminator on CSG1 is the local
discriminator on RSG6.
After you configure static BFD for PW on CSG1 and primary/secondary RSGs, services can
quickly switch to the secondary PW if the primary PW fails.
Service Overview
IP/MPLS backbone networks carry an increasing number of multicast services, such as IPTV,
video conferences, and massively multiplayer online role-playing games (MMORPGs), which
all require bandwidth assurance, QoS guarantee, and high network reliability. To provide
better multicast services, the IETF proposed the multicast VPLS solution. On a multicast
VPLS network, the ingress transmits multicast traffic to multiple egresses over a P2MP MPLS
tunnel. This solution eliminates the need to deploy PIM and HVPLS on the transit nodes,
simplifying network deployment.
On a multicast VPLS network, multicast traffic can be carried over either P2MP TE tunnels or
P2MP mLDP tunnels. When P2MP TE tunnels are used, P2MP TE FRR must be deployed. If
a link fault occurs, FRR allows traffic to be rapidly switched to a normal link. If a node fails,
however, traffic is not switched until the root node detects the fault and recalculates links to
set up a Source to Leaf (S2L) sub-LSP. Topology convergence takes a long time in this
situation, affecting service reliability.
To meet the reliability requirements of multicast services, configure BFD for multicast VPLS
to monitor multicast VPLS links. When a link or node fails, BFD on the leaf nodes can
rapidly detect the fault and trigger protection switching so that the leaf nodes receive traffic
from the backup multicast tunnel.
Networking Description
Figure 1-166 shows a dual-root 1+1 protection scenario in which PE-AGG1 is the master root
node and PE-AGG2 is the backup root node. Each root node sets up a complete MPLS
multicast tree to the UPEs (leaf nodes). The two MPLS multicast trees do not have
overlapping paths. After multicast flows reach PE-AGG1 and PE-AGG2, PE-AGG1 and
PE-AGG2 send the multicast flows along their respective P2MP tunnels to UPEs. Each UPE
receives two copies of multicast flows and selects one to send to users.
BFD for multicast VPLS sessions support only one-way detection. The BFD session of the
MultiPointHead type on a root node only sends packets, whereas the BFD session of the MultiPointTail
type on a leaf node only receives packets.
On the network shown in Figure 1-166, if link 1 (an AC) fails, BFD on the master root node
detects that the AC interface is Down and stops sending BFD detection packets. The leaf
nodes cannot receive BFD detection packets, and therefore report the Down event, which
triggers protection switching. The leaf nodes then receive multicast flows from the backup
multicast tunnel. Similarly, if node 2, link 3, node 4, or link 5 fails, the leaf nodes also receive
multicast flows from the backup multicast tunnel. After the fault is rectified, BFD sessions are
reestablished. The leaf nodes then receive multicast flows from the master multicast tunnel
again.
Currently, BFD for PIM can be used on both IPv4 PIM-SM/Source-Specific Multicast (SSM) and IPv6
PIM-SM/SSM networks.
As shown in Figure 1-167, on the shared network segment where user hosts reside, a PIM
BFD session is set up between the downstream interface Port 2 of Device B and the
downstream interface Port 1 of Device C. Both ports send BFD packets to detect the status of
the link between them.
Port 2 of Device B is elected as a DR for forwarding multicast data to the receiver. If Port 2
fails, BFD immediately notifies the RM module of the session status and the RM module then
notifies the PIM module. The PIM module triggers a new DR election. Port 1 of Device C is
then elected as a new DR to forward multicast data to the receiver.
1.5.3 LMSP
1.5.3.1 Introduction
Definition
Linear multiplex section protection (LMSP) is an SDH interface-based protection technique
that uses an SDH interface to protect services on another SDH interface. If a link failure
occurs, LMSP enables a device to send a protection switching request over K bytes to its peer
device. The peer device then returns a switching bridge reply.
Purpose
Large numbers of low-speed links still exist on the user side. These links may be unstable due
to aging. These links have a small capacity and may fail to work properly due to congestion in
traffic burst scenarios. Therefore, a protection technique is required to provide reliability and
stability for these low-speed links.
LMSP is an inherent feature of an SDH network. When a mobile bearer network is deployed,
a router must be connected to an add/drop multiplexer (ADM) or RNC, both of which support
LMSP. As the original protection function of the router cannot properly protect the
communication channel between the router and ADM or RNC, LMSP is introduced to resolve
this issue.
Benefits
LMSP offers the following benefits:
Improves the reliability and security of low-speed links and enhances product
creditability and market competitiveness by reducing labor costs (automatic switching)
and decreasing network interruption time (rapid switching).
Improves user experience by increasing user access success rates.
1.5.3.2 Principles
LMSP is a redundancy protection mechanism that uses a backup channel to protect services
on a channel. LMSP is defined in ITU-T G.783 and G.841 and used to protect multiplex
section (MS) layers in linear networking mode. LMSP applies to point-to-point physical
networks.
LMSP can protect services against disconnection of the optical fiber on which the working MS resides,
regenerator failures, and MS performance deterioration. It does not protect against node failures.
communicate with each other by multiplexing services to SDH payloads and transmitting the
payloads over optical fibers. An LMSP-enabled router can protect traffic on a link to an ADM
on an SDH network that has LMSP functions. Two LMSP-enabled routers can also interwork
to protect traffic on the direct link between them.
Linear MS Mode
Linear MS modes are classified as 1+1 or 1:N protection modes by protection structure (only
1:1 protection is implemented).
In 1+1 protection mode, each working link has a dedicated protection link as its backup.
In a process called bridging, a transmit end transmits data on both the working and
protection links simultaneously. In normal circumstances, a receive end receives data
from the working link. If the working link fails and the receive end detects the failure,
the receive end receives data from the protection link. Generally, only a receive end
performs a switching action, along with single-ended protection. K1 and K2 bytes are
not required for LMSP negotiation.
The 1+1 protection mode has advantages such as rapid traffic switching and high
reliability. However, this mode has a low channel usage (about 50%). Figure 1-168
shows the 1+1 protection mode.
In 1:N protection mode, a protection link provides traffic protection for N working links
(1 ≤ N ≤ 14). In normal circumstances, a transmit end transmits data on a working link.
The protection link can transmit low-priority data or it may not transmit any data. If the
working link fails, the transmit end bridges data onto the protection link. The receive end
then receives data from the protection link. If the transmit end is transmitting
low-priority data on the protection link, it will stop the data transmission and start
transmitting high-priority protected data. Figure 1-169 shows the 1:N protection mode.
If several working links fail at the same time, only data on the working link with the
highest priority can be switched to the protection link. Data on other faulty working links
is lost.
When N is 1, the 1:N protection mode becomes the 1:1 protection mode.
The 1:N protection mode requires both a transmit end and a receive end to perform
switching. Therefore, K1 and K2 bytes are required for negotiation. The 1:N protection
mode has a high channel usage but poorer reliability than the 1+1 protection mode.
Linear MS K Bytes
LMSP uses APS to control bridging, switching, and recovery actions. APS information is
transmitted over the K1 and K2 bytes in the MS overhead in an SDH frame structure. Table
1-48 lists the bit layout of the K1 and K2 bytes.
K1 K2
Bit Bit Bit Bit Bit Bit Bit Bit Bit Bit Bit Bit Bit Bit Bit Bit
7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
Bits 3, 2, 1, and 0 of the K1 byte: switching request channel numbers. The value 0
indicates a protection channel. The values 1 to 14 indicate working channels (the value
can be only 1 in 1+1 protection mode). The value 15 indicates an extra service channel
(the value can be 15 only in 1:N protection mode).
Bits 7, 6, 5, and 4 of the K2 byte: bridging/switching channel numbers. The value
meanings of a bridging channel number are the same as those of a switching request
channel number.
Bit 3 of the K2 byte: protection mode. The value 0 indicates 1+1 protection, and the
value 1 indicates 1:1 protection.
Bits 2, 1, and 0 of the K2 byte: MS status code. The values are as follows:
− 000: idle state
− 111: multiplex section alarm indication signal (MS-AIS)
− 110: multiplexing section remote degradation indication (MS-RDI)
− 101: dual-ended
− 100: single-ended (not defined by standards)
PGP
MC-LMSP is implemented between main control boards over PGP. The connection mode is
UDP. Figure 1-172 shows the communication process.
1. Interface board of the master device sends a message to the main control board through
the IPC.
2. The main control board of the master device constructs a PGP packet and invokes the
socket to send the packet.
3. The master device sends the packet from the main control board to interface board over
the VP.
4. The master device sends the packet through an interface.
5. The backup device receives the packet from interface board.
6. The backup device sends the packet to the main control board over the VP.
7. The backup device performs APS PGP processing.
8. The main control board of the backup device sends a message to interface board through
the IPC.
9. The interface board's APS module of the backup device performs processing.
1. The interfaces on TPE2 and TPE3 form an MC-LMSP group. TPE2 and TPE3 are
configured as the working and protection NEs, respectively. The LMSP state machine
runs on TPE3.
2. PW1 and PW2 form an inter-device PW APS group.
3. A DNI-PW is deployed between TPE2 and TPE3 for traffic switching.
4. An ICB channel is deployed to synchronize the status between TPE2 and TPE3.
1.5.3.3 Applications
1.5.3.3.1 Application of Single-chassis LMSP on a Mobile Bearer Network
On the network shown in Figure 1-174, single-chassis LMSP is deployed on the access and
network sides of the router.
On the access side, a NodeB/BTS is connected to the router over an E1 or SDH link, and
a microwave or SDH device is connected to the router over an optical fiber.
Single-chassis LMSP is configured for the STM-1 link between the router and
microwave or SDH device.
On the network side, the router is connected to PEs. Single-chassis LMSP is configured
on POS or CPOS interfaces.
Access Side
Scenario 1: On the network shown in Figure 1-175, a base station is connected to the router
through the microwave devices and then over the IMA/TDM link (CPOS interface) that has
LMSP configured. The RNC is connected to the device over the IMA/TDM link (CPOS
interface). After base station data reaches the router, the base station can interwork with the
RNC over the PW between the router and device.
Scenario 2: On the network shown in Figure 1-176, a base station is connected to the router
through the microwave devices and then over the IMA link (CPOS interface) that has LMSP
configured. The RNC is connected to the device over the ATM link. After base station data
reaches the router, the base station can interwork with the RNC over the PW between the
router and device.
Network Side
Scenario 1: On the network shown in Figure 1-177, the router's network-side interface is a
CPOS interface on which a global MP group is configured. Single-chassis LMSP is
configured on the CPOS interface. The router is connected to another device to carry
PW/L3VPN/MPLS/DCN services.
Scenario 2: On the network shown in Figure 1-178, the router's network-side interface is a
POS interface. Single-chassis LMSP is configured on the POS interface. The router is
connected to another device to carry PW/VPLS/L3VPN/MPLS/DCN services.
Figure 1-180 Network with MC-LMSP 1+1 protection+two bypass PWs deployed
Definition
As a key technology used on scalable next generation networks, Multiprotocol Label
Switching (MPLS) provides multiple services with quality of service (QoS) guarantee. MPLS,
however, introduces a unique network layer, which causes faults. Therefore, MPLS networks
must obtain operation, administration and maintenance (OAM) capabilities.
OAM is an important means to reduce network maintenance costs. The MPLS OAM
mechanism manages operation and maintenance of MPLS networks.
For details about the MPLS OAM background, see ITU-T Recommendation Y.1710. For
details about the MPLS OAM implementation mechanism, see ITU-T Recommendation
Y.1711.
Purpose
The server-layer protocols, such as Synchronous Optical Network (SONET)/Synchronous
Digital Hierarchy (SDH), is below the MPLS layer; the client-layer protocols, such as IP, FR,
and ATM, is above the MPLS layer. These protocols have their own OAM mechanisms.
Failures in the MPLS network cannot be rectified completely through the OAM mechanism of
other layers. In addition, the network technology hierarchy also requires MPLS to have its
independent OAM mechanism to decrease dependency between layers on each other.
The MPLS OAM mechanism can detect, identify, and locate a defect at the MPLS layer
effectively. Then, the MPLS OAM mechanism reports and handles the defect. In addition, if a
failure occurs, the MPLS OAM mechanism triggers protection switching.
MPLS offers an OAM mechanism totally independent of any upper or lower layer. The
following OAM features are enabled on the MPLS user plane:
Monitors links connectivity.
Evaluates network usage and performance.
Performs a traffic switchover if a fault occurs so that services meet service level
agreements (SLAs).
Benefit
MPLS OAM can rapidly detect link faults or monitor the connectivity of links, which
helps measure network performance and minimizes OPEX.
If a link fault occurs, MPLS OAM rapidly switches traffic to the standby link to restore
services, which shortens the defect duration and improves network reliability.
Reverse Tunnel
A reverse tunnel is bound to an LSP that is monitored using MPLS OAM. The reverse tunnel
can transmit BDI packets to notify the ingress of an LSP defect.
A reverse tunnel and the LSP to which the reverse tunnel is bound must have the same
endpoints.
The reverse tunnel transmitting BDI packets can be either of the following types:
Private reverse LSP
Shared reverse LSP
The NE20E implements the OAM auto protocol to resolve these drawbacks.
The OAM auto protocol is configured on the egress. With this protocol, the egress can
automatically start OAM functions after receiving the first OAM packet. In addition, the
egress can dynamically stop running the OAM state machine after receiving an FDI packet
sent by the ingress.
1.5.4.2 Principles
1.5.4.2.1 Basic Detection
Background
The Multiprotocol Label Switching (MPLS) operation, administration and maintenance
(OAM) mechanism effectively detects and locates MPLS link faults. The MPLS OAM
mechanism also triggers a protection switchover after detecting a fault.
Related Concepts
MPLS OAM packets
Table 1-50 describes MPLS OAM packets.
Backward defect indication (BDI) packet Sent by the egress to notify the ingress of
an LSP defect.
Channel defects
Table 1-51 describes channel defects that MPLS OAM can detect.
Defect Description
Type
MPLS dLOCV: a connectivity verification loss defect.
layer A dLOCV defect occurs if no CV or FFD packets are received after three
defects consecutive intervals at which CV or FFD packets are sent elapse.
dTTSI_Mismatch: a trail termination source identifier (TTSI) mismatch
defect.
A dTTSI_Mismatch defect occurs if no CV or FFD packets with correct
TTSIs are received after three consecutive intervals at which CV or FFD
packets are sent elapse.
dTTSI_Mismerge: a TTSI mis-merging defect.
A dTTSI_Mismerge defect occurs if CV or FFD packets with both correct
and incorrect TTSIs are received within three consecutive intervals at which
CV or FFD packets are sent.
dExcess: an excessive rate at which connectivity detection packets are
received.
A dExcess defect occurs if five or more correct CV or FFD packets are
received within three consecutive intervals at which CV or FFD packets are
sent.
Reverse tunnel
A reverse tunnel is bound to an LSP that is monitored using MPLS OAM. The reverse
tunnel can transmit BDI packets to notify the ingress of an LSP defect. A reverse tunnel
and the LSP to which the reverse tunnel is bound must have the same endpoints, and they
transmit traffic in opposite directions. The reverse tunnels transmitting BDI packets
include private or shared LSPs. Table 1-52 lists the two types of reverse tunnel.
type Description
Private Bound to only one LSP. The binding between the private reverse LSP and
reverse LSP its forward LSP is stable but may waste LSP resources.
Shared Bound to many LSPs. A TTSI carried in a BDI packet identifies a specific
reverse LSP forward LSP bound to a reverse LSP. The binding between a shared
reverse LSP and multiple forward LSPs minimizes LSP resource wastes. If
defects occur on multiple LSPs bound to the shared reverse LSP, the
reverse LSP may be congested with traffic.
Implementation
MPLS OAM periodically sends CV or FFD packets to monitor TE LSPs, PWs, or ring
networks.
MPLS OAM for TE LSPs
MPLS OAM monitors TE LSPs. If MPLS OAM detects a fault in a TE LSP, it triggers a
traffic switchover to minimize traffic loss.
Figure 1-185 illustrates a network on which MPLS OAM monitors TE LSP connectivity.
The process of using MPLS OAM to monitor TE LSP connectivity is as follows:
a. The ingress sends a CV or FFD packet along a TE LSP to be monitored. The packet
passes through the TE LSP and arrives at the egress.
b. The egress compares the packet type, frequency, and TTSI in the received packet
with the locally configured values to verify the packet. In addition, the egress
collects the number of correct and incorrect packets within a detection interval.
c. If the egress detects an LSP defect, the egress analyzes the defect type and sends a
BDI packet carrying defect information to the ingress along a reverse tunnel. The
ingress can then be notified of the defect. If a protection group is configured, the
ingress switches traffic to a backup LSP.
MPLS OAM for PWs
MPLS OAM periodically sends CV or FFD packets to monitor PW connectivity. If
MPLS OAM detects a PW defect, it sends BDI packets carrying the defect type along a
reverse tunnel and instructs a client-layer application to switch traffic from the active
link to the standby link.
The dLOCV defect also occurs when OAM is disabled. OAM must be disabled on the
ingress and egress before the OAM detection packet type or the interval at which
detection packets are sent can be changed.
OAM parameters, including a detection packet type and an interval at which detection
packets are sent must be set on both the ingress and egress. This is likely to cause a
parameter inconsistency.
The OAM auto protocol enabled on the egress provides the following functions:
Triggers OAM
− If the sink node does not support OAM CC and CC parameters (including the
detection packet type and interval at which packets are sent), upon the receipt of the
first CV or FFD packet, the sink node automatically records the packet type and
interval at which the packet is sent and uses these parameters in CC detection that
starts.
− If the OAM function-enabled sink node does not receive CV or FFD packets within
a specified period of time, the sink node generates a BDI packet and notifies the
NMS of the BDI defect.
Dynamically stops running the OAM. If the detection packet type or interval at which
detection packets are sent is to be changed on the source node, the source node sends an
FDI packet to instruct the sink node to stop the OAM state machine. If an OAM function
is to be disabled on the source node, the source node also sends an FDI packet to instruct
the sink node to stop the OAM state machine.
1.5.4.3 Applications
1.5.4.3.1 MPLS OAM Application in the IP RAN Layer 2 to Edge Scenario
MPLS OAM is deployed on PEs to maintain and operate MPLS networks. Working at the
MPLS client and server layers, MPLS OAM can effectively detect, identify, and locate client
layer faults and quickly switch traffic if links or nodes become faulty, reducing network
maintenance cost.
Figure 1-187 illustrates an IP RAN in the Layer 2 to edge scenario. The MPLS OAM
implementation is as follows:
The BTS, NodeB, BSC, and RNC can be directly connect to an MPLS network.
A TE tunnel between PE1 and PE4 is established. PWs are established over the TE
tunnel to transmit various services.
MPLS OAM is enabled on PE1 and PE4 OAM parameters are configured on PE1 and
PE4 on both ends of a PW. These PEs are enabled to send and receive OAM detection
packets, which allows OAM to monitor the PW between PE1 and PE4. OAM can obtain
basic PW information. If OAM detects a default, PE4 sends a BDI packet to PE1 over a
reverse tunnel. PEs notify the user-side BTS, NodeB, RNC, and BSC of fault
information so that the user-side devices can use the information to maintain networks.
Service Overview
The operation and maintenance of virtual leased line (VLL) and virtual private LAN service
(VPLS) services require an operation, administration and maintenance (OAM) mechanism.
MultiProtocol Label Switching Transport Profile MPLS OAM provides a mechanism to
rapidly detect and locate faults, which facilitates network operation and maintenance and
reduces the network maintenance costs.
Networking Description
As shown in Figure 1-188, a user-end provider edge (UPE) on the access network is
dual-homed to SPE1 and SPE2 on the aggregation network. A VLL supporting access links of
various types is deployed on the access network. A VPLS is deployed on the aggregation
network to form a point-to-multipoint leased line network. Additionally, Fast Protection
Switching (FPS) is configured on the UPE; MPLS tunnel automatic protection switching
(APS) is configured on SPE1 and SPE2 to protect the links between the virtual switching
instances (VSIs) created on the two superstratum provider edges (SPEs).
Feature Deployment
To deploy MPLS OAM to monitor link connectivity of VLL and VPLS pseudo wires (PWs),
configure maintenance entity groups (MEGs) and maintenance entities (MEs) on the UPE,
SPE1, and SPE2 and then enable one or more of the continuity check (CC), loss measurement
(LM), and delay measurement (DM) functions. The UPE monitors link connectivity and
performance of the primary and secondary PWs.
MPLS-TP OAM is implemented as follows:
When SPE1 detects a link fault on the primary PW, SPE1 sends a Remote Defect
Indication (BDI) packet to the UPE, instructing the UPE to switch traffic from the
primary PW to the secondary PW. Meanwhile, the UPE sends a MAC Withdraw packet,
in which the value of the PE-ID field is SPE1's ID, to SPE2. After receiving the MAC
Withdraw packet, SPE2 transparently forwards the packet to the NPE and the NPE
deletes the MAC address it has learned from SPE1. After that, the NPE learns a new
MAC address from the secondary PW.
After the primary PW recovers, the UPE switches traffic from the secondary PW back to
the primary PW. Meanwhile, the UPE sends a MAC Withdraw packet, in which the value
of the PE-ID field is SPE2's ID, to SPE1. After receiving the MAC Withdraw packet,
SPE1 transparently forwards the packet to the NPE and the NPE deletes the MAC
address it has learned from SPE2. After that, the NPE learns a new MAC address from
the new primary PW.
Terms
Item Definition
reverse A direction opposite to the direction that traffic flows along the
monitored service link.
forward A direction that traffic flows along the monitored service link.
path merge LSR An LSR that receives the traffic transmitted on the protection path
in MPLS OAM protection switching.
If the path merge LSR is not the traffic destination, it sends and
merges the traffic transmitted on the protection path onto the
working path.
If the path merge LSR is the destination of traffic, it sends the
traffic to the upper-layer protocol for handling.
path switch LSR An LSR that switches or replicates traffic between the primary
service link and the bypass service link.
user plane A set of traffic forwarding components through which traffic flow
passes. An OAM CV or FFD packet is periodically inserted to this
traffic flow to monitor the forwarding component status. In IETF
drafts, the user plane is also called the data plane.
Ingress An LSR from which the forward LSP originates and at which the
reverse LSP terminates.
Egress An LSR at which the forward LSP terminates and from which the
reverse LSP originates.
Definition
Multiprotocol Label Switching Protocol Transport Profile (MPLS-TP) is a transport technique
that integrates MPLS packet switching with traditional transport network features. MPLS-TP
networks are poised to replace traditional transport networks in the future. MPLS-TP
Operation, Administration, and Maintenance (MPLS-TP OAM) works on the MPLS-TP client
layer. It can effectively detect, identify, and locate faults in the client layer and quickly switch
traffic when links or nodes become defective. OAM is an important part of any plan to reduce
network maintenance expenditures.
Purpose
Both networks and services are part of an ongoing process of transformation and integration.
New services like triple play services, Next Generation Network (NGN) services, carrier
Ethernet services, and Fiber-to-the-x (FTTx) services are constantly emerging from this
process. Such services demand more investment and have higher OAM costs. They require
state of the art QoS, full service access, and high levels of expansibility, reliability, and
manageability of transport networks. Traditional transport network technologies such as
Multi-Service Transfer Platform (MSTP), Synchronous Digital Hierarchy (SDH), or
Wavelength Division Multiplexing (WDM) cannot meet these requirements because they lack
a control plane. Unlike traditional technologies, MPLS-TP does meet these requirements
because it can be used on next-generation transport networks that can process data packets, as
well as on traditional transport networks.
Because traditional transport networks or Optical Transport Node (OTN) networks have high
reliability and maintenance benchmarks, MPLS-TP must provide powerful OAM capabilities.
MPLS-TP OAM provides the following functions:
Fault management
Performance monitoring
Triggering protection switching
Benefits
MPLS-TP OAM can rapidly detect link faults or monitor the connectivity of links, which
helps measure network performance and minimizes OPEX.
If a link fault occurs, MPLS-TP OAM rapidly switches traffic to the standby link to
restore services, which shortens the defect duration and improves network reliability.
MEG
A maintenance entity group (MEG) comprises one or more MEs that are created for a
transport link. If the transport link is a point-to-point bidirectional path, such as a
bidirectional co-routed LSP or pseudo wire (PW), a MEG comprises only one ME.
MEP
A MEP is the source or sink node in a MEG. Figure 1-190 shows ME node deployment.
− For a bidirectional LSP, only the ingress label edge router (LER) and egress LER
can function as MEPs, as shown in Figure 1-190.
− For a PW, only user-end provider edges (UPEs) can function as MEPs.
MEPs trigger and control MPLS-TP OAM operations. OAM packets can be generated or
terminated on MEPs.
Fault Management
Table 1-53 lists the MPLS-TP OAM fault management functions supported by the NE20E.
Function Description
Continuity check Checks link connectivity periodically.
(CC)
Connectivity Detects forwarding faults continuously.
verification (CV)
Loopback (LB) Performs loopback.
Remote defect Notifies remote defects.
indication (RDI)
Performance Monitoring
Table 1-54 lists the MPLS-TP OAM performance monitoring functions supported by the
NE20E.
Function Description
Loss measurement Collects statistics about lost frames. LM includes the following
(LM) functions:
Single-ended frame loss measurement
Dual-ended frame loss measurement
Delay measurement Collects statistics about delays and delay variations (jitter). DM
(DM) includes the following functions:
One-way frame delay measurement
Two-way frame delay measurement
1.5.5.2 Principles
1.5.5.2.1 Basic Concepts
An MPLS-TP network consists of the section, LSP, and PW layers in bottom-up order. A
lower layer is a server layer, and an upper layer is a client layer. For example, the section
layer is the LSP layer's server layer, and the LSP layer is the section layer's client layer.
On the MPLS-TP network shown in Figure 1-191, MPLS-TP OAM detects and locates faults
in the section, LSP, and PW layers. Table 1-55 describes MPLS-TP OAM components.
MEG end point (MEP) A MEP is the source or sink Section layer: Each LSR
node in a MEG. can function as a MEP.
Each LSR functions as
an LSR.
LSP layer: Only an LER
can function as a MEP.
LSRs A, D, E, and G are
LERs functioning as
MEPs.
PW layer: Only PW
terminating provider
edge (T-PE) LSRs can
function as MEPs.
LSRs A and G are T-PEs
functioning as MEPs.
Usage Scenario
MPLS-TP OAM monitors the following types of links:
Static bidirectional co-routed CR-LSPs
Static VLL-PWs,VPLS-PWs
CC
CC is a proactive OAM operation. It detects LOC faults between any two MEPs in a MEG. A
MEP sends CC messages (CCMs) to a remote RMEP at specified intervals. If the RMEP does
not receive a CCM for a period 3.5 times as long as the specified interval, it considers the
connection between the two MEPs faulty. This causes the RMEP to report an alarm and enter
the Down state, and the RMEP triggers automatic protection switching (APS) on both MEPs.
After receiving a CCM from the MEP, the RMEP will clear the alarm and exit the Down state.
CV
CV is also a proactive OAM operation. It enables a MEP to report alarms when unexpected or
error packets are received. For example, if a CV-enabled MEP receives a packet from an LSP
and finds that this packet has been transmitted in error along an LSP, the MEP will report an
alarm indicating a forwarding error.
After receiving CCMs carrying packet count information, both MEPs use the following
formulas to measure near- and far-end packet loss values:
Near-end packet loss value = |TxFCf[tc] - TxFCb[tp]| - |RxFCl[tc] - RxFCl[tp]|
Far-end packet loss value = |TxFCb[tc] - TxFCb[tp]| - |RxFCb[tc] - RxFCb[tp]|
TxFCf[tc], RxFCb[tc], and TxFCb[tc] are the TxFCf, RxFCb, and TxFCb values,
respectively, which are carried in the most recently received CCM. RxFCl[tc] is the local
RxFCl value recorded when the local MEP received the CCM.
TxFCf[tp], RxFCb[tp], and TxFCb[tp] are the TxFCf, RxFCb, and TxFCb values,
respectively, which are carried in the previously received CCM. RxFCl[tp] is the local
RxFCl value recorded when the local MEP received the previous CCM.
tc is the time a current CCM was received.
tp is the time the previous CCM was received.
TxFCl: the local TxFCl value recorded when the LMM was sent.
After receiving an LMM, the RMEP responds to the local MEP with loss measurement replies
(LMRs) carrying the following information:
TxFCf: equal to the TxFCf value carried in the LMM.
RxFCf: the local RxFCl value recorded when the LMM was received.
TxFCb: the local TxFCl value recorded when the LMR was sent.
Figure 1-193 illustrates proactive single-end packet loss measurement.
After receiving an LMR, the local MEP uses the following formulas to calculate near- and
far-end packet loss values:
Near-end packet loss value = |TxFCb[tc] - TxFCb[tp]| - |RxFCl[tc] - RxFCl[tp]|
Far-end packet loss value = |TxFCf[tc] - TxFCf[tp]| - |RxFCf[tc] - RxFCf[tp]|
TxFCf[tc], RxFCf[tc], and TxFCb[tc] are the TxFCf, RxFCf, and TxFCb values,
respectively, which are carried in the most recently received LMR. RxFCl[tc] is the local
RxFCl value recorded when the most recent LMR arrives at the local MEP.
TxFCf[tp], RxFCf[tp], and TxFCb[tp] are the TxFCf, RxFCf, and TxFCb values,
respectively, which are carried in the previously received LMR. RxFCl[tp] is the local
RxFCl value recorded when the previous LMR arrived at the local MEP.
tc is the time a current LMR was received.
tp is the time the previous LMR was received.
After the RMEP receives a 1DM, it subtracts the TxTimeStampf value from the RxTimef
value to calculate the delay time:
Frame delay time = RxTimef - TxTimeStampf
The frame delay value can be used to measure the delay variation that is the absolute
difference between two delay time values.
One-way frame delay measurement can only be performed when the two MEPs on both ends
of a link have synchronous time. If these MEPs have asynchronous time, they can only
measure the delay variation.
Upon receipt of the DMR, the local MEP calculates the two-way frame delay time using the
following formula:
Frame delay = RxTimeb (the time the DMR was received) - TxTimeStampf
To obtain a more accurate result, RxTimeStampf and TxTimeStampb are used. RxTimeStampf
indicates the time a DMM is received, and TxTimeStampb indicates the time a DMR is sent.
After the local MEP receives the DMR, it calculates the frame delay time using the following
formula:
Frame delay = (RxTimeb - TxTimeStampf) - (TxTimeStampb - RxTimeStampf)
Two-way frame delay measurement supports both delay and delay variation measurement
even if these MEPs do not have synchronous time. The frame delay time is the round-trip
delay time. If both MEPs have synchronous time, the round-trip delay time can be calculated
by combining the two delay values using the following formulas:
MEP-to-RMEP delay time = RxTimeStampf - TxTimeStampf
RMEP-to-MEP delay time = RxTimeb - TxTimeStampb
After the fault is rectified, the local MEP sets the RDI flag to 0 in CCMs and sends them
to inform the RMEP that the fault is rectified.
The RDI function is associated with the proactive continuity check function and takes effect only
after the continuity check function is enabled.
The RDI function applies only to bidirectional links. In the case of a unidirectional LSP, before RDI
can be used, a reverse path must be bound to the LSP.
1.5.5.2.6 Loopback
Background
On a multiprotocol label switching transport profile (MPLS-TP) network, a virtual circuit may
traverse muptiple exchanging devices (nodes), including maintenance association end points
(MEPs) and maintenance association intermediate points (MIPs). Any faulty node or link fault
in a virtual circuit may lead to the unavailability of the entire virtual circuit. Moreover, the
fault cannot be located. Loopback (LB) can be configured on a source device (MEP) to detect
or locate faults in links between the MEP and a MIP or between MEPs.
Related Concepts
LB and continuity check (CC) are both connectivity monitoring tools on an MPLS-TP
network. Table 1-58 describes differentces between CC and LB.
Implementation
The loopback function monitors the connectivity of bidirectional links between a MEP and a
MIP and between MEPs.
A loopback test is initiated on a MEP, and the destination can be set to an RMEP or a MIP.
The loopback test process is as follows:
1. The source MEP sends a loopback message (LBM) to a destination. If a MIP is used as
the destination, the TTL in the LBM must be equal to the number of hops from the
source to the destination. LBM checks whether the target MIP ID carried by itself and
the MIP ID are the same. If a MEP is used as the destination, the TTL must be greater
than or equal to the number of hops to the destination. The TTL setting prevents the
LBM from being discarded before reaching the destination.
2. After the destination receives the LBM, it checks whether the target MIP ID or MEP ID
matches the local MIP ID or MEP ID. If they do not match, the destination discards the
LBM. If they match, the destination responds with a loopback reply (LBR).
3. If the source MEP receives the LBR within a specified period of time, it considers the
destination reachable and the loopback test successful. If the source MEP does not
receive the LBR after the specified period of time elapses, it records a loopback test
timeout and log information that is used to analyze the connectivity failure.
Figure 1-196 illustrates a loopback test. LSRA initiates a loopback test to LSRC on an LSP.
The loopback test process is as follows:
1. LSRA sends LSRC an LBM carrying a specified TTL and a MIP ID. LSRB
transparently transmits the LBM to LSRC.
2. Upon receipt, LSRC determines that the TTL carried in the LBM times out and checks
whether the target MIP ID carried in the LBM matches the local MIP ID. If they do not
match, LSRC discards the LBM. If they match, LSRC responds with an LBR.
3. If LSRA receives the LBR within a specified period of time, it considers LSRC
reachable. If LSRA fails to receive the LBR after a specified period of time elapses,
LSRA considers LSRC unreachable and records log information that is used to analyze
the connectivity failure.
1.5.5.3 Applications
1.5.5.3.1 MPLS-TP OAM Application in the IP RAN Layer 2 to Edge Scenario
MPLS-TP OAM is deployed on PEs to maintain and operate MPLS networks. Working at the
MPLS client and server layers, MPLS-TP OAM can effectively detect, identify, and locate
client layer faults and quickly switch traffic if links or nodes become faulty, reducing network
maintenance cost.
Figure 1-197 illustrates an IP RAN in the Layer 2 to edge scenario. The MPLS OAM
implementation is as follows:
The BTS, NodeB, BSC, and RNC can be directly connect to an MPLS-TP network.
A TE tunnel between PE1 and PE4 is established. PWs are established over the TE
tunnel to transmit various services.
MPLS-TP OAM is enabled on PE1 and PE4 OAM parameters are configured on PE1
and PE4 on both ends of a PW. These PEs are enabled to send and receive OAM
detection packets, which allows OAM to monitor the PW between PE1 and PE4. OAM
can obtain basic PW information. If OAM detects a default, PE4 sends a RDI packet to
PE1 over a reverse tunnel. PEs notify the user-side BTS, NodeB, RNC, and BSC of fault
information so that the user-side devices can use the information to maintain networks.
Service Overview
The operation and maintenance of virtual leased line (VLL) and virtual private LAN service
(VPLS) services require an operation, administration and maintenance (OAM) mechanism.
MultiProtocol Label Switching Transport Profile (MPLS-TP) OAM provides a mechanism to
rapidly detect and locate faults, which facilitates network operation and maintenance and
reduces the network maintenance costs.
Networking Description
As shown in Figure 1-198, a user-end provider edge (UPE) on the access network is
dual-homed to SPE1 and SPE2 on the aggregation network. A VLL supporting access links of
various types is deployed on the access network. A VPLS is deployed on the aggregation
network to form a point-to-multipoint leased line network. Additionally, Fast Protection
Switching (FPS) is configured on the UPE; MPLS tunnel automatic protection switching
(APS) is configured on SPE1 and SPE2 to protect the links between the virtual switching
instances (VSIs) created on the two superstratum provider edges (SPEs).
Feature Deployment
To deploy MPLS-TP OAM to monitor link connectivity of VLL and VPLS pseudo wires
(PWs), configure maintenance entity groups (MEGs) and maintenance entities (MEs) on the
UPE, SPE1, and SPE2 and then enable one or more of the continuity check (CC), and
loopback (LB) functions. The UPE monitors link connectivity and performance of the primary
and secondary PWs.
MPLS-TP OAM is implemented as follows:
When SPE1 detects a link fault on the primary PW, SPE1 sends a Remote Defect
Indication (RDI) packet to the UPE, instructing the UPE to switch traffic from the
primary PW to the secondary PW. Meanwhile, the UPE sends a MAC Withdraw packet,
in which the value of the PE-ID field is SPE1's ID, to SPE2. After receiving the MAC
Withdraw packet, SPE2 transparently forwards the packet to the NPE and the NPE
deletes the MAC address it has learned from SPE1. After that, the NPE learns a new
MAC address from the secondary PW.
After the primary PW recovers, the UPE switches traffic from the secondary PW back to
the primary PW. Meanwhile, the UPE sends a MAC Withdraw packet, in which the value
of the PE-ID field is SPE2's ID, to SPE1. After receiving the MAC Withdraw packet,
SPE1 transparently forwards the packet to the NPE and the NPE deletes the MAC
address it has learned from SPE2. After that, the NPE learns a new MAC address from
the new primary PW.
Terms
None
Abbreviations
Abbreviation Full Name
AIS Alarm Indication Signal
CC Continuity Check
1.5.6 VRRP
1.5.6.1 Introduction
Definition
The Virtual Router Redundancy Protocol (VRRP) is a fault-tolerant protocol that groups
several routers into a virtual router. If the next hop of a host fails, VRRP switches traffic to
another router, which ensures communication continuity and reliability.
In this document, if a VRRP function supports both IPv4 and IPv6, the implementation of this VRRP
function is the same for IPv4 and IPv6 unless otherwise specified.
VRRP is a fault-tolerant protocol defined in relevant standards. VRRP allows logical devices
to work separately from physical devices and implements route selection among multiple
egress gateways.
On the network shown in Figure 1-199, VRRP is enabled on two routers. One is the master
and the other is the backup. The two routers form a virtual router and this virtual router is
assigned a virtual IP address and a virtual MAC address. Hosts monitor only the presence of
the virtual router. The hosts communicate with devices on other network segments through
the virtual router.
A virtual router consists of a master router and one or more backup routers. Only the master
router forwards packets. If the master router fails, a backup router is elected as the master
router and takes over.
On a multicast or broadcast LAN (for example, an Ethernet), VRRP uses a logical VRRP
gateway to ensure reliability for key links. VRRP prevents service interruptions if a physical
VRRP gateway fails, providing high reliability. VRRP configuration is simple and takes effect
without modification in configurations, such as routing protocol configurations.
Purpose
As networks rapidly develop and applications become diversified, various value-added
services, such as Internet Protocol television (IPTV) and video conferencing, have become
widespread. Demands for network infrastructure reliability are increasing, especially in
nonstop network transmission.
Generally, hosts use one default gateway to communicate with external networks. If the
default gateway fails, communication between the hosts and external networks is interrupted.
System reliability can be improved using dynamic routing protocols (such as RIP and OSPF)
or ICMP Router Discovery Protocol (IRDP). However, this method requires complex
configurations and each host must support dynamic routing protocols.
VRRP resolves this issue by enabling several routers to be grouped into a virtual router, also
called a VRRP backup group. In normal circumstances, the master router in the VRRP backup
group functions as a default gateway and provides access services for users. If the master
router fails, VRRP elects a backup router from the VRRP backup group to provide access
services for users.
Hosts on a local area network (LAN) are usually connected to an external network through a
default gateway. When the hosts send packets destined for addresses out of the local network
segment, these packets follow a default route to an egress gateway. A provider edge (PE)
functions as an egress gateway on the network shown in Figure 1-200. The PE forwards
packets to the external network so that the hosts can communicate with the external network.
If the PE fails, the hosts connected to it cannot communicate with the external network. The
communication failure persists even if another router is added to the LAN. This is because
only a single default gateway can be configured for most hosts on a LAN and forward all data
packets destined for devices that are not on the local network segment. Hosts send packets
only through the default gateway though they are connected to multiple routers.
Configuring multiple egress gateways is a common method to prevent communication
interruptions. This method is available only if one of routes to these egress gateways can be
selected. Another method is to use dynamic routing protocols, such as the Routing
Information Protocol (RIP), Open Shortest Path First (OSPF), and Internet Control Message
Protocol (ICMP). This method is available only if every host runs a dynamic routing protocol
and there is no problem in management, security, or operating systems' support for protocols.
VRRP prevents communication failures in a better way than the preceding two methods.
VRRP is configured only on routers to implement gateway backup, without any networking
changes or burden on hosts.
Benefits
VRRP offers the following benefits to carriers:
Reliable transmission: A logical VRRP gateway on a multicast or broadcast local area
network (LAN), such as an Ethernet network, ensures reliable transmission over key
links. VRRP helps prevent service interruptions if a link to a physical VRRP gateway
fails.
Flexible applications: A VRRP header is encapsulated into an IP packet. This
implementation allows the association between VRRP and various upper-layer protocols.
Low network overheads: VRRP uses only VRRP Advertisement packets.
VRRP load balancing is classified as multi-gateway or single-gateway load balancing. For details about
VRRP load balancing, see the chapter "VRRP" in HUAWEI NE20E-S2 Universal Service Router
Feature Description - Network Reliability.
1.5.6.2 Principles
1.5.6.2.1 Basic VRRP Concepts
As shown in Figure 1-203, two gateways are grouped to form a virtual gateway, and the user
host uses the virtual gateway's IP address as the default gateway IP address to communicate
with the external network. If the default gateway fails, VRRP elects a new gateway to provide
access services for the user.
Virtual router: also called a VRRP backup group, consists of a master router and one or
more backup routers. A virtual router is a default gateway used by hosts within a shared
local area network (LAN). A virtual router ID and one or more virtual IP addresses
together identify a virtual router.
− Virtual router ID (VRID): ID of a virtual router. Routers with the same VRID form
a virtual router.
− Virtual IP address: IP address of a virtual router. A virtual router can have one or
more virtual IP addresses, which are manually assigned.
− Virtual MAC address: MAC address generated by a virtual router based on a VRID.
A virtual router has one virtual MAC address, in the format of
00-00-5E-00-01-{VRID} (VRRP for IPv4) or 00-00-5E-00-02-{VRID} (VRRP for
IPv6). After a virtual router receives an ARP (VRRP for IPv4) or NS (VRRP for
IPv6) request, it responds to the request with the virtual MAC address rather than
the actual MAC address.
IP address owner: VRRP router that uses the virtual IP address as its interface IP address.
If an IP address owner is available, it functions as the master router.
Primary IP address: IP address selected from actual interface IP addresses, which is
usually the first IP address that is configured. The primary IP address is used as the
source IP address in a VRRP Advertisement packet.
VRRP router: device running VRRP. A VRRP router can join one or more VRRP backup
groups. A VRRP backup group consists of the following VRRP routers:
− Master router: forwards packets and responds to ARP requests.
− Backup router: does not forward packets when the master router is working
properly, but can be elected as the new master router if the master router fails.
Priority: priority of a router in a VRRP backup group. A VRRP backup group elects the
master and backup routers based on router priorities.
VRRP working modes:
− Preemption mode: A backup router with a higher priority than the master router
preempts the Master state.
− Non-preemption mode: When the master router is working properly, a backup
router does not preempt the Master state even if it has a priority higher than the
master router.
VRRP timers:
− Adver_Interval timer: The master router sends a VRRP Advertisement packet each
time the Adver_Interval timer expires. The default timer value is 1 second.
− Master_Down timer: A backup router preempts the Master state after the
Master_Down timer expires. The Master_Down timer value (in seconds) is
calculated using the following equation:
Master_Down timer value = (3 x Adver_Interval timer value) + Skew_Time
where
Skew_Time = (256 - Backup router's priority)/256
VRRP versions include VRRPv2 and VRRPv3. VRRPv2 applies only to IPv4 networks, and
VRRPv3 applies to both IPv4 and IPv6 networks. VRRP is classified as VRRP for IPv4
(VRRP4) or VRRP for IPv6 (VRRP6) by network type. VRRP4 supports both VRRPv2 and
VRRPv3, and VRRP6 supports only VRRPv3.
For an IPv4 network, VRRP packets are encapsulated into IPv4 packets and sent to an IPv4
multicast address assigned to a VRRP4 backup group. In an IPv4 packet header:
The source address is the primary IPv4 address of the interface that sends the packet.
The destination address is 224.0.0.18.
The time to live (TTL) value is 255.
The protocol number is 112.
For an IPv6 network, VRRP packets are encapsulated into IPv6 packets and sent to an IPv6
multicast address assigned to a VRRP6 backup group. In an IPv6 packet header:
The source address is the link-local address of the interface that sends the packet.
The destination address is FF02::12.
The hop count is 255.
The protocol number is 112.
The
NE20E allows you to manually switch a VRRP version. VRRP packets refer to VRRPv2 packets, unless
otherwise specified in this document.
Field Description
Field Description
Version Version number of the VRRP protocol. The
value is 2.
Type Type of the VRRPv2 packet. The value is 1,
indicating that the packet is an
advertisement packet.
Virtual Rtr ID Virtual router identifier.
Priority Priority of the master router in a VRRP
backup group.
Count IPv4 Addrs Number of virtual IPv4 addresses
configured for a VRRP backup group.
Auth Type VRRPv2 packet authentication type.
VRRPv2 defines the following
authentication types:
0: Non Authentication, indicating that
authentication is not performed.
1: Simple Text Password, indicating that
simple authentication is performed.
2: IP Authentication Header, indicating
that MD5 authentication is performed.
Adver Int Interval at which VRRPv2 packets are sent,
in seconds.
Checksum 16-bit checksum, used to check the data
integrity of the VRRPv2 packet.
IPv4 Address Virtual IPv4 address configured for a VRRP
backup group.
Authentication Data Authentication key in the VRRPv2 packet.
This field applies only when simple or MD5
authentication is used. For other
authentication types, this field is fixed to 0.
As shown in Figure 1-204 and Figure 1-205, the main differences between VRRPv2 and
VRRPv3 are as follows:
VRRPv2 supports authentication, whereas VRRPv3 does not.
VRRPv2 supports a second-level interval between sending VRRP Advertisement packets,
whereas VRRPv3 supports a centisecond-level interval.
Master A router in the Master state provides The master router changes its status as
the following functions: follows:
Sends a VRRP Advertisement Changes from Master to Backup if
packet each time the the VRRP priority in a received
Adver_Interval timer expires. VRRP Advertisement packet is
Responds to an ARP request with higher than the local VRRP
an ARP reply carrying the virtual priority.
MAC address. Remains in the Master state if the
Forwards IP packets sent to the VRRP priority in a received
NOTE
If devices in a VRRP backup group are in
the Master state and a device receives a
VRRP Advertisement packet with the
same priority as the local VRRP priority,
the device compares the IP address in the
packet with the local IP address. If the IP
address in the packet is greater than the
local IP address, the device switches to
the Backup state. If the IP address in the
packet is less than or equal to the local IP
address, the device remains in the Master
state.
Backup A router in the Backup state provides A backup router changes its status as
the following functions: follows:
Receives VRRP Advertisement Changes from Backup to Master
packets from the master router and after it receives a Master_Down
checks whether the master router timer timeout event.
is working properly based on Changes from Backup to Initialize
information in the packets. after it receives a Shutdown event,
Does not respond to an ARP indicating that the VRRP-enabled
request carrying a virtual IP interface has been shut down.
address.
Discards IP packets sent to the
virtual MAC address.
Discards IP packets sent to virtual
IP addresses.
If, in preemption mode, it receives
a VRRP Advertisement packet
carrying a VRRP priority lower
than the local VRRP priority, it
preempts the Master state after a
specified preemption delay.
If, in non-preemption mode, it
receives a VRRP Advertisement
packet carrying a VRRP priority
lower than the local VRRP priority
it remains in the Backup state.
Resets the Master_Down timer but
does not compare IP addresses if it
receives a VRRP Advertisement
packet carrying a VRRP priority
higher than or equal to the local
If multiple VRRP routers enter the Master state at the same time, they exchange VRRP
Advertisement packets to determine the master or backup role. The VRRP router with the highest
priority remains in the Master state, and VRRP routers with lower priorities switch to the Backup
state. If these routers have the same priority and the VRRP backup group is configured on a router's
interface with the largest primary IP address, that router becomes the master router.
If a VRRP router is the IP address owner, it immediately switches to the Master state after receiving
a Startup event.
If network congestion occurs, a backup router may not receive VRRP Advertisement packets from the
master router. If this situation occurs, the backup router proactively switches to the Master state. If the
new master router receives a VRRP Advertisement packet from the original master router, the new
master router will switch back to the Backup state. As a result, the routers in the VRRP backup group
frequently switch between Master and Backup. You can configure a preemption delay to resolve this
issue. After the configuration is complete, the backup router with the highest priority switches to the
Master state only when all of the following conditions are met:
VRRP Authentication
VRRP supports different authentication modes and keys in VRRP Advertisement packets that
meet various network security requirements.
On secure networks, you can use the non authentication mode. In this mode, a device
does not authenticate VRRP Advertisement packets before sending them. After a peer
device receives VRRP Advertisement packets, it does not authenticate them either, but it
considers them authentic and valid.
On insecure networks, you can use the simple or message digest algorithm 5 (MD5)
authentication mode.
Master/Backup Mode
A VRRP backup group comprises a master router and one or more backup routers. As shown
in Figure 1-207, Device A is the master router and forwards packets, and Device B and
Device C are backup routers and monitor Device A's status. If Device A fails, Device B or
Device C is elected as a new master router and takes over services from Device A.
3. After Device A recovers, it enters the Backup state (its priority remains 120). After
receiving a VRRP Advertisement packet from Device B, the current master, Device A
finds that its priority is higher than that of Device B. Therefore, Device A preempts the
Master state after the preemption delay elapses, and sends VRRP Advertisement packets
and gratuitous ARP packets.
After receiving a VRRP Advertisement packet from Device A, Device B finds that its
priority is lower than that of Device A and changes from the Master state to the Backup
state. User traffic then switches to the original path Device E -> Device A -> Device D.
As shown in Figure 1-208, VRRP backup groups 1 and 2 are deployed on the network.
− VRRP backup group 1: Device A is the master router, and Device B is the backup
router.
− VRRP backup group 2: Device B is the master router, and Device A is the backup
router.
VRRP backup groups 1 and 2 back up each other and serve as gateways for different
users, therefore load-balancing service traffic.
As shown in Figure 1-209, VRRP backup groups 1 and 2 are deployed on the network.
− VRRP backup group 1: an LBRG. Device A is the master router, and Device B is the
backup router.
− VRRP backup group 2: an LBRG member group. Device B is the master router, and
Device A is the backup router.
VRRP backup group 1 serves as a gateway for all users. After receiving an ARP request
packet from a user, VRRP backup group 1 returns an ARP response packet and
encapsulates its virtual MAC address or VRRP backup group 2's virtual MAC address in
the response.
1.5.6.2.5 mVRRP
Principles
A switch is dual-homed to two routers at the aggregation layer on a metropolitan area network
(MAN). Multiple VRRP backup groups can be configured on the two routers to transmit
various types of services. Because each VRRP backup group must maintain its own state
machine, a large number of VRRP Advertisement packets are transmitted between the routers.
To help reduce bandwidth and CPU resource consumption during VRRP packet transmission,
a VRRP backup group can be configured as a management Virtual Router Redundancy
Protocol (mVRRP) backup group. Other VRRP backup groups are bound to the mVRRP
backup group and become service VRRP backup groups. Only the mVRRP backup group
sends VRRP packets to negotiate the master/backup status. The mVRRP backup group
determines the master/backup status of service VRRP backup groups.
As shown in Figure 1-210, an mVRRP backup group can be deployed on the same side as
service VRRP backup groups or on the interfaces that directly connect Device A and Device
B.
Related Concepts
mVRRP backup group: has all functions of a common VRRP backup group. Different from a
common VRRP backup group, an mVRRP backup group can be tracked by service VRRP
backup groups and determine their statuses. An mVRRP backup group provides the following
functions:
When the mVRRP backup group functions as a gateway, it determines the master/backup
status of devices and transmits services. In this situation, a common VRRP backup group
with the same ID as the mVRRP backup group must be created and assigned a virtual IP
address. The mVRRP backup group's virtual IP address is a gateway IP address set by
users.
When the mVRRP backup group does not function as a gateway, it determines the
master/backup status of devices but does not transmit services. In this situation, the
mVRRP backup group does not require a virtual IP address. You can create an mVRRP
backup group directly on interfaces to simplify maintenance.
Service VRRP backup group: After common VRRP backup groups are bound to an mVRRP
backup group, they become service VRRP backup groups. Service VRRP backup groups do
not need to send VRRP packets to determine their states. The mVRRP backup group sends
VRRP packets to determine its state and the states of all its bound service VRRP backup
groups. A service VRRP backup group can be bound to an mVRRP backup group in either of
the following modes:
Flowdown: The flowdown mode applies to networks on which both upstream and
downstream packets are transmitted over the same path. If the master device in an
mVRRP backup group enters the Backup or Initialize state, the VRRP module instructs
all service VRRP backup groups that are bound to the mVRRP backup group in
flowdown mode to enter the Initialize state.
Unflowdown: The unflowdown mode applies to networks on which upstream and
downstream packets can be transmitted over different paths. If the mVRRP backup group
enters the Backup or Initialize state, the VRRP module instructs all service VRRP
backup groups that are bound to the mVRRP backup group in unflowdown mode to enter
the same state.
Multiple service VRRP backup groups can be bound to an mVRRP backup group. However, the mVRRP
backup group cannot function as a service backup group and is bound to another mVRRP backup group.
If a physical interface on which a service VRRP backup group is configured goes Down, the status of the
service VRRP backup group becomes Initialize, irrespective of the status of the mVRRP backup group.
Benefits
VRRP offers the following benefits:
Simplified management. An mVRRP backup group determines the master/backup status
of service VRRP backup groups.
Reduced CPU and bandwidth resource consumption. Service VRRP backup groups do
not need to send VRRP packets.
Background
Virtual Router Redundancy Protocol (VRRP) can monitor the status change only in the
VRRP-enabled interface on the master device. If a VRRP-disabled interface on the master
device or the uplink connecting the interface to a network fails, VRRP cannot detect the fault,
which causes traffic interruptions.
To resolve this issue, configure VRRP to monitor the VRRP-disabled interface status. If a
VRRP-disabled interface on the master device or the uplink connecting the interface to a
network fails, VRRP instructs the master device to reduce its priority to trigger a
master/backup VRRP switchover.
Related Concepts
If a VRRP-disabled interface of a VRRP device goes Down, the VRRP device changes its
VRRP priority in either of the following modes:
Increased mode: The VRRP device increases its VRRP priority by a specified value.
Reduced mode: The VRRP device reduces its VRRP priority by a specified value.
Implementation
As shown in Figure 1-211, a VRRP backup group is configured on Device A and Device B.
Device A is the master device, and Device B is the backup device.
Device A is configured to monitor interface 1. If interface 1 fails, Device A reduces its VRRP
priority and sends a VRRP Advertisement packet carrying a reduced priority. After Device B
receives the packet, it checks that its VRRP priority is higher than the received priority and
preempts the Master state.
After interface 1 goes Up, Device A restores the VRRP priority. After Device A receives a
VRRP Advertisement packet carrying Device B's priority in preemption mode, Device A
checks that its VRRP priority is higher than the received priority and preempts the Master
state.
Benefits
The association between VRRP and a VRRP-disabled interface helps trigger a master/backup
VRRP switchover if the VRRP-disabled interface fails or the uplink connecting the interface
to a network fails.
Background
Devices in a VRRP backup group exchange VRRP Advertisement packets to negotiate the
master/backup status and implement backup. If the link between devices in a VRRP backup
group fails, VRRP Advertisement packets cannot be exchanged to negotiate the
master/backup status. A backup device attempts to preempt the Master state after a period
three times as long as the time interval at which VRRP Advertisement packets are broadcast.
During this period, user traffic is still forwarded to the master device, which results in user
traffic loss.
Bidirectional Forwarding Detection (BFD) can rapidly detect faults in links or IP routes. BFD
for VRRP enables a master/backup VRRP switchover to be completed within 1 second,
preventing user traffic loss. A BFD session is established between the master and backup
devices in a VRRP backup group and is bound to the VRRP backup group. BFD immediately
detects communication faults in the VRRP backup group and instructs the VRRP backup
group to perform a master/backup switchover, minimizing service interruptions.
Figure 1-212 Association between a VRRP backup group and a common BFD session
Association Between a VRRP Backup Group and Link and Peer BFD Sessions
As shown in Figure 1-213, the master and backup devices monitor the status of link and peer
BFD sessions to identify local or remote faults.
Device A and Device B run VRRP. A peer BFD session is established between Device A and
Device B to detect link and device failures. Link BFD sessions are established between
Device A and Device E and between Device B and Device E to detect link and device failures.
After Device B detects that the peer BFD session goes Down and Link2 BFD session goes Up,
Device B's VRRP status changes from Backup to Master, and Device B takes over.
Figure 1-213 Association between a VRRP backup group and link and peer BFD sessions
A Link2 fault does not affect Device A's VRRP status, and Device A continues to forward upstream
traffic. However, Device B's VRRP status becomes Master if both the peer BFD session and Link2 BFD
session go Down, and Device B detects the peer BFD session status change before detecting the Link2
BFD session status change. After Device B detects the Link2 BFD session status change, Device B's
VRRP status becomes Initialize.
Figure 1-214 shows the state machine for the association between a VRRP backup group and
link and peer BFD sessions.
Figure 1-214 State machine for the association between a VRRP backup group and link and peer
BFD sessions
The preceding process shows that, after link and peer BFD for VRRP is deployed, the backup
device immediately preempts the Master state if a fault occurs. Link and peer BFD for VRRP
implements a millisecond-level master/backup VRRP switchover.
Benefits
BFD for VRRP speeds up master/backup VRRP switchovers if faults occur.
Principles
Metro Ethernet solutions use Virtual Router Redundancy Protocol (VRRP) tracking
Bidirectional Forwarding Detection (BFD) to detect link faults and protect links between the
master and backup network provider edges (NPEs) and between NPEs and user-end provider
edges (UPEs). If UPEs do not support BFD, Metro Ethernet solutions cannot use VRRP
tracking BFD. If UPEs support 802.3ah, Metro Ethernet solutions can use 802.3ah as a
substitute for BFD to detect link faults and protect links between NPEs and UPEs. Ethernet
operation, administration and maintenance (OAM) technologies, such as Ethernet in the First
Mile (EFM) OAM defined in IEEE 802.3ah, provide functions, such as link connectivity
detection, link failure monitoring, remote failure notification, and remote loopback for links
between directly connected devices.
Implementation
EFM can detect only local link failures. If the link between the UPE and NPE1 fails, NPE2 cannot detect
the failure. NPE2 has to wait three VRRP Advertisement packet transmission intervals before it switches
to the Master state. During this period, upstream service traffic is interrupted. To speed up
master/backup VRRP switchovers and minimize the service interruption time, configure VRRP also to
track the peer BFD session.
Figure 1-215 shows a network on which VRRP tracking EFM is configured. NPE1 and NPE2
are configured to belong to a VRRP backup group. A peer BFD session is configured to detect
the faults on the two NPEs and on the link between the two NPEs. An EFM session is
configured between the UPE and NPE1 and between the UPE and NPE2 to detect the faults
on the UPE and NPEs and on the links between the UPE and NPEs. The VRRP backup group
determines the VRRP status of NPEs based on the link status reported by EFM and the peer
BFD session.
In normal circumstances, if the link between the UPE and NPE2 fails, NPE1 remains in the Master state
and continues to forward upstream traffic. However, NPE2's VRRP status changes to Master if NPE2
detects the Down state of the peer BFD session before it detects the Discovery state of the link between
itself and the UPE. After NPE2 detects the Discovery state of the link between itself and the UPE,
NPE2's VRRP status changes from Master to Initialize.
Figure 1-216 shows the state machine for VRRP tracking EFM.
Benefits
VRRP tracking EFM facilitates master/backup VRRP switchovers on a network on which
UPEs do not support BFD but support 802.3ah.
Principles
Virtual Router Redundancy Protocol (VRRP) tracking Ethernet in the First Mile (EFM)
effectively facilitates link fault detection on a network on which UPEs do not support
Bidirectional Forwarding Detection (BFD). However, EFM can detect faults only on
single-hop links. As shown in Figure 1-217, EFM cannot detect faults on the link between
UPE2 and NPE1 or between UPE2 and NPE2.
Implementation
CFM can detect only local link failures. If the link between UPE2 and NPE1 fails, NPE2 cannot detect
the failure. NPE2 has to wait three VRRP Advertisement packet transmission intervals before it switches
to the Master state. During this period, upstream service traffic is interrupted. To speed up
master/backup VRRP switchovers and minimize the service interruption time, configure VRRP also to
track the peer BFD session.
Figure 1-218 shows a network on which VRRP tracks CFM and the peer BFD session.
In normal circumstances, if the link between UPE2 and NPE2 fails, NPE1 remains in the Master state
and continues to forward upstream traffic. However, NPE2's VRRP status changes to Master if NPE2
detects the Down state of the peer BFD session before it detects the Down state of the link between itself
and UPE2. After NPE2 detects the Down state of the link between itself and UPE2, NPE2's VRRP status
changes from Master to Initialize.
Figure 1-219 shows the state machine for VRRP tracking CFM.
Benefits
VRRP tracking CFM prevents service interruptions caused by dual master devices in a VRRP
backup group and facilitates master/slave VRRP switchovers.
Background
To improve network reliability, VRRP can be configured on a device to track the following
objects:
Interface
EFM session
BFD session
Failure of a tracked object can trigger a rapid master/backup VRRP switchover to ensure
service continuity.
In Figure 1-220, however, if Interface 2 on Device C goes Down and its IP address (20.1.1.1)
becomes unreachable, VRRP is unable to detect the fault. As a result, user traffic is dropped.
To resolve the preceding issue, you can associate VRRP with network quality analysis (NQA).
Using test instances, NQA sends probe packets to check the reachability of destination IP
addresses. After VRRP is associated with an NQA test instance, VRRP tracks the NQA test
instance to implement rapid master/backup VRRP switchovers. For the example shown in the
preceding figure, you can configure an NQA test instance on Device A to check whether the
IP address 20.1.1.1 of Interface 2 on Device C is reachable.
VRRP association with an NQA test instance is required on only the local device (Device A).
Implementation
You can configure VRRP association with an NQA test instance to track a gateway router's
uplink, which is a cross-device link. If the uplink fails, NQA instructs VRRP to reduce the
gateway router's priority by a specified value. Reducing the priority enables another gateway
router in the VRRP backup group to take over services and become the master, thereby
ensuring communication continuity between hosts on the LAN served by the gateway and the
external network. After the uplink recovers, NQA instructs VRRP to restore the gateway
router's priority.
Figure 1-221 illustrates VRRP association with an NQA test instance.
Benefits
VRRP association with NQA implements a rapid master/backup VRRP switchover if a
cross-device uplink fails.
Background
To improve device reliability, two user gateways working in master/backup mode are
connected to a network, and VRRP is enabled on these gateways to determine their
master/backup status. If a VRRP backup group has been configured and an uplink route to a
network becomes unreachable, access-side users still use the VRRP backup group to forward
traffic along the uplink route, which causes user traffic loss.
Association between a VRRP backup group and a route can prevent user traffic loss. A VRRP
backup group can be configured to track the uplink route to a network. If the route is
withdrawn or becomes inactive, the route management (RM) module notifies the VRRP
backup group of the change. After receiving the notification, the VRRP backup group changes
its master device's VRRP priority and performs a master/backup switchover. This process
ensures that user traffic can be forwarded along a properly functioning link.
Implementation
A VRRP backup group can be associated with an uplink route to a network to determine
whether the route is reachable. If the uplink route is withdrawn or becomes inactive after the
uplink goes Down or the network topology changes, hosts on a local area network (LAN) fail
to access the external network through gateways. The RM module notifies the VRRP backup
group of the route status change. The VRRP priority of the master device decreases by a
specified value. A backup device with a priority higher than others preempts the Master state
and takes over traffic. This process ensures communication continuity between these hosts
and the external network. After the uplink recovers, the RM module instructs the VRRP
backup group to restore the master device's VRRP priority.
As shown in Figure 1-222, a VRRP backup group is configured on Device A (master) and
Device B (backup), with Device A forwarding user traffic. The VRRP backup group on
Device A is associated with the route 100.1.2.0/24.
When the uplink from Device A to Device C goes Down, the route 100.1.2.0/24 becomes
unreachable and Device A's VRRP priority decreases. Because Device A's reduced VRRP
priority is lower than Device B's VRRP priority, Device B preempts the Master state and takes
over, which prevents user traffic loss.
Advertisement packets and determines that its priority is lower than that of Device A,
Device B returns to the Backup state.
4. Device A in the Master state forwards user traffic, and Device B remains in the Backup
state.
The preceding process shows that the VRRP backup group performs a master/backup
switchover if the uplink route is unreachable.
Benefits
Association between a VRRP backup group and a route helps implement a master/backup
VRRP switchover when an uplink route to a network is unreachable. The association also
ensures that the VRRP backup group performs a traffic switchback and minimizes traffic
downtime.
Background
A VRRP backup group is configured on Device1 and Device2 on the network shown in Figure
1-223. Device1 is a master device, whereas Device2 is a backup device. The VRRP backup
group serves as a gateway for users. User-to-network traffic travels through Device1.
However, network-to-user traffic may travel through Device1, Device2, or both of them over
a path determined by a dynamic routing protocol. Therefore, user-to-network traffic and
network-to-user traffic may travel along different paths, which interrupts services if firewalls
are attached to devices in the VRRP backup group, complicates traffic monitoring or statistics
collection, and increases costs.
To address the preceding problems, the routing protocol is expected to select a route passing
through the master device so that the user-to-network and network-to-user traffic travels along
the same path. Association between direct routes and a VRRP backup group can meet
expectations by allowing the dynamic routing protocol to select a route based on the VRRP
status.
Figure 1-223 Association between direct routes and a VRRP backup group
Related Concepts
Direct route: a 32-bit host route or a network segment route that is generated after a device
interface is assigned an IP address and its protocol status is Up. A device automatically
generates direct routes without using a routing algorithm.
Implementation
Association between direct routes and a VRRP backup group allows VRRP interfaces to
adjust the costs of direct network segment routes based on the VRRP status. The direct route
with the master device as the next hop has the lowest cost. A dynamic routing protocol
imports the direct routes and selects the direct route with the lowest cost. For example, VRRP
interfaces on Device1 and Device2 on the network shown in 1.10.2.2.15 are configured with
association between direct routes and the VRRP backup group. The implementation is as
follows:
Device1 in the Master state sets the cost of its route to the directly connected virtual IP
network segment to 0 (default value).
Device2 in the Backup state increases the cost of its route to the directly connected
virtual IP network segment.
A dynamic routing protocol selects the route with Device1 as the next hop because this route
costs less than the other route. Therefore, both user-to-network and network-to-user traffic
travels through Device1.
Usage Scenario
When a data center is used, firewalls are attached to devices in a VRRP backup group to
improve network security. Network-to-user traffic cannot pass through a firewall if it travels
over a path different than the one used by user-to-network traffic.
When an IP radio access network (RAN) is configured, VRRP is configured to set the
master/backup status of aggregation site gateways (ASGs) and radio service gateways (RSGs).
Network-to-user and user-to-network traffic may pass through different paths, complicating
network operation and management.
Association between direct routes and a VRRP backup group can address the preceding
problems by ensuring the user-to-network and network-to-user traffic travels along the same
path.
Principles
As shown in Figure 1-224, the base station attached to the cell site gateway (CSG) on a
mobile bearer network accesses aggregation nodes PE1 and PE2 over primary and secondary
pseudo wires (PWs) and accesses PE3 and PE4 over primary and secondary links. PE3 and
PE4 are configured to belong to a Virtual Router Redundancy Protocol (VRRP) backup group.
If PE1 fails, traffic switches from the primary link to the secondary link. Before a
master/backup VRRP switchover is complete, service traffic is temporarily interrupted.
To meet carrier-class reliability requirements, configure devices in the VRRP backup group to
forward traffic even when they are in the Backup state. This configuration can prevent traffic
interruptions in the preceding scenario.
Implementation
As shown in Figure 1-224, upstream traffic travels along the path CSG -> PE1 -> PE3 ->
RNC1/RNC2 in normal circumstances. PE3 is in the Master state, and PE4 in the Backup
state.
If PE1 fails, traffic switches from the primary link between PE1 and PE3 to the secondary link
between PE2 and PE4. Because the speed of a primary/secondary link switchover is higher
than that of a master/backup VRRP switchover:
If PE4 cannot forward traffic, service traffic is temporarily interrupted before the
master/backup VRRP switchover is complete.
If PE4 can forward traffic, PE4 takes over service traffic forwarding even if the
master/backup VRRP switchover is not complete.
Benefits
Traffic forwarding by a backup device improves master/backup VRRP switchover
performance and reduces the service interruption time.
Principles
On the network shown in Figure 1-225, VRRP-enabled NPEs are connected to user-side PEs
through active and standby links. User traffic travels over the active link to the master NPE1,
and NPE1 forwards user traffic to the Internet. If NPE1 is working properly, user traffic
travels over the path UPE -> PE1 -> NPE1. If the active link or NPE1's interface 1 tracked by
the VRRP backup group fails, an active/standby link switchover and a master/backup VRRP
switchover are implemented. After the switchovers, user traffic switches to the path UPE ->
PE1 -> PE2 -> NPE2. After the fault is rectified, an active/standby link switchback and a
master/backup VRRP switchback are implemented. If the active link becomes active before
the original master device restores the Master state, user traffic is interrupted.
To prevent user traffic interruptions, the rapid VRRP switchback function is used to allow the
original master device to switch from the Backup state to the Master state immediately after
the fault is rectified.
Related Concept
A VRRP switchback is a process during which the original master device switches its status
from Backup to Master after a fault is rectified.
Implementation
Rapid VRRP switchback allows the original master device to switch its status from Back to
Master without using VRRP Advertisement packets to negotiate the status. For example, on
the network shown in Figure 1-225, device configurations are as follows:
A common VRRP backup group is configured on NPE1 and NPE2 that run VRRP. An
mVRRP backup group is configured on directly connected interfaces of NPE1 and NPE2.
The common VRRP backup group is bound to the mVRRP backup group and becomes a
service VRRP backup group. The mVRRP backup group determines the master/backup
status of the service VRRP backup group.
NPE1 has a VRRP priority of 120 and works in the Master state in the mVRRP backup
group.
NPE2 has a VRRP priority of 100 and works in the Backup state in the mVRRP backup
group.
NPE1 tracks interface 1 and reduces its priority by 40 if interface 1 goes Down.
The rapid VRRP switchback process is as follows:
1. If NPE1 is working properly, NPE1 periodically sends VRRP Advertisement packets to
notify NPE2 of the Master state. NPE1 tracks interface 1 connected to the active link.
2. If the active link or interface 1 fails, interface 1 goes Down. The service VRRP backup
group on NPE1 is in the Initialize state. NPE1 reduces its mVRRP priority to 80 (120 -
40). As a result, the mVRRP priority of NPE2 is higher than that of NPE1, and NPE2
immediately preempts the Master state. NPE2 then sends a VRRP Advertisement packet
carrying a higher priority than that of NPE1. After receiving the packet, the mVRRP
backup group on NPE1 stops sending VRRP Advertisement packets and enters the
Backup state. The status of the service VRRP backup group is the same as that of the
mVRRP backup group on NPE2. User traffic switches to the path UPE -> PE1 -> PE2 ->
NPE2.
3. After the fault is rectified, interface 1 goes Up and NPE1 increases its VRRP priority to
120 (80 + 40). NPE1 immediately preempts the Master state and sends VRRP
Advertisement packets to NPE2. User traffic switches back to the path UPE -> PE1 ->
NPE1.
If rapid VRRP switchback is not configured and NPE1 restores its priority to 120, NPE1 has to wait
until it receives VRRP Advertisement packets carrying a lower priority than its own priority from NPE2
before preempting the Master state.
4. NPE1 then sends VRRP Advertisement packets carrying a higher priority than NPE2's
priority. After receiving the VRRP Advertisement packets, NPE2 enters the Backup state.
Both NPE1 and NPE2 restore their previous status.
Usage Scenario
Rapid VRRP switchback applies to a specific network with all of the following
characteristics:
The master device in an mVRRP backup group tracks a VRRP-disabled interface or
feature and reduces its VRRP priority if the interface or feature status becomes Down.
Devices in a VRRP backup group are connected to user-side devices over the active and
standby links.
Benefits
Rapid VRRP switchback speeds up a VRRP switchback after a fault is rectified.
1.5.6.3 Applications
1.5.6.3.1 IPRAN Gateway Protection Solution
Service Overview
NodeBs and radio network controllers (RNCs) on an IP radio access network (IPRAN) do not
have dynamic routing capabilities. Static routes must be configured to allow NodeBs to
communicate with access aggregation gateways (AGGs) and allow RNCs to communicate
with radio service gateways (RSGs) at the aggregation level. To ensure that various
value-added services, such as voice, video, and cloud computing, are not interrupted on
mobile bearer networks, a VRRP backup group can be deployed to implement gateway
redundancy. When the master device in a VRRP backup group goes Down, a backup device
takes over, ensuring normal service transmission and enhancing device reliability at the
aggregation layer.
Networking Description
Figure 1-226 shows the network for the IPRAN gateway protection solution. A NodeB is
connected to AGGs over an access ring or is dual-homed to two AGGs. The cell site gateways
(CSGs) and AGGs are connected using the pseudo wire emulation edge-to-edge (PWE3)
technology, which ensures connection reliability. Two VRRP backup groups can be
configured on the AGGs and RSGs to implement gateway backup for the NodeB and RNC,
respectively.
Feature Deployment
Table 1-63 describes VRRP-based gateway protection applications on an IPRAN.
When AGG1 recovers, it becomes the master device after a specified preemption delay
elapses. AGG2 then becomes the backup device. Traffic sent from the NodeB goes through
the CSGs to AGG1 over the previous primary PW. AGG1 sends the traffic to RSG1 through
the P device. RSG1 then sends the traffic to the RNC. The path for user-to-network traffic is
CSG -> AGG1 -> P -> RSG1 -> RNC, and the path for network-to-user traffic is RNC ->
RSG1 -> P -> AGG1 -> CSG.
Benefits
P2P EFM, E2E CFM, E2E Y.1731, and their combinations are used to provide a complete
Ethernet OAM solution, which brings the following benefits:
Ethernet is deployed near user premises using remote terminals and roadside cabinets at
remote central offices or in unattended areas. Ethernet OAM allows remote maintenance,
saving the trouble in onsite maintenance. Engineers operate detection, diagnosis, and
monitoring protocols and techniques from remote locations to maintain Ethernet
networks. Remote OAM maintenance saves the trouble of onsite maintenance and helps
reduce maintenance and operation expenditures.
Ethernet OAM supports various performance monitoring tools that are used to monitor
network operation and assess service quality based on SLAs. If a device using the tools
detects faults, the device sends traps to a network management system (NMS). Carriers
use statistics and trap information on NMSs to adjust services. The tools help ensure
proper transmission of voice and data services.
OAMPDUs
EFM works at the data link layer and uses protocol packets called OAM protocol data units
(OAMPDUs). EFM devices periodically exchange OAMPDUs to report link status, helping
network administrators effectively manage networks. Figure 1-230 shows the format and
common types of OAMPDUs. Table 1-65 lists and describes fields in an OAMPDU.
Field Description
Dest addr Destination MAC address, which is a slow-protocol multicast address
0x0180-C200-0002. Network bridges cannot forward slow-protocol
packets. EFM OAMPDUs cannot be forwarded over multiple devices,
even if OAM is supported or enabled on the devices.
Source addr Source address, which is a unicast MAC address of a port on the
transmit end. If no port MAC address is specified on the transmit end,
the bridge MAC address of the transmit end is used.
Type Slow protocol type, which has a fixed value of 0x8809.
Subtype Subtype of a slow protocol. The value is 0x03, which means that the
slow sub-protocol is EFM.
Flags Status of an EFM entity:
Remote Stable
Remote Evaluating
Local Stable
Local Evaluating
Critical Event
Field Description
Link Fault
Connection Modes
EFM supports two connection modes: active and passive. Table 1-67 describes capabilities of
processing OAMPDUs in the two modes.
An EFM connection can be initiated only by an OAM entity working in active mode. An OAM
entity working in passive mode waits to receive a connection request from its peer entity. Two OAM
entities that both work in passive mode cannot establish an EFM connection between them.
An OAM entity that is to initiate a loopback request must work in active mode.
1.5.7.2.2 Background
As telecommunication technologies develop quickly and the demand for service diversity is
increasing, various user-oriented teleservices are being provided over digital and intelligent
media through broadband paths. Backbone network technologies, such as synchronous digital
hierarchy (SDH), asynchronous transfer mode (ATM), passive optical network (PON), and
dense wavelength division multiplexing (DWDM), grow mature and popular. The
technologies allow the voice, data, and video services to be transmitted over a single path to
every home. Telecommunication experts and carriers focus on using existing network
resources to support new types of services and improve the service quality. The key point is to
provide a solution to the last-mile link to a user network.
A "last mile" reliability solution also needs to be provided. High-end clients, such as banks
and financial companies, demand high reliability. They expect carriers to monitor both carrier
networks and last-mile links that connect users to those carrier networks. EFM can be used to
satisfy these demands.
On the network shown in Figure 1-231, EFM is an OAM mechanism that applies to the
last-mile Ethernet access links to users. Carriers use EFM to monitor link status in real time,
rapidly locate failed links, and identify fault types if faults occur. OAM entities exchange
various OAMPDUs to monitor link connectivity and locate link faults.
OAM Discovery
During the discovery phase, a local EFM entity discovers and establishes a stable EFM
connection with a remote EFM entity. Figure 1-233 shows the discovery process.
Link Monitoring
Monitoring Ethernet links is difficult if network performance deteriorates while traffic is
being transmitted over physical links. To resolve this issue, the EFM link monitoring function
can be used. This function can detect data link layer faults in various environments. EFM
entities that are enabled with link monitoring exchange Event Notification OAMPDUs to
monitor links.
If an EFM entity receives a link event listed in Table 1-68, it sends an Event Notification
OAMPDU to notify the remote EFM entity of the event and also sends a trap to an NMS.
After receiving the trap on the NMS, an administrator can determine the network status and
take remedial measures as needed.
Errored symbol If the number of symbol errors This event helps the device detect
period event that occur on a device's interface code errors during data
during a specified period of time transmission at the physical layer.
reaches a specified upper limit,
the device generates an errored
symbol period event, advertises
the event to the remote device,
and sends a trap to the NMS.
Errored frame If the number of frame errors This event helps the device detect
event that occur on a device's interface frame errors that occur during data
during a specified period of time transmission at the MAC sublayer.
reaches a specified upper limit,
the device generates an errored
frame event, advertises the event
to the remote device, and sends a
trap to the NMS.
Errored frame An errored frame second is a This event helps the device detect
seconds one-second interval wherein at errored frame seconds that occur
summary event least one frame error is detected. during data transmission at the
If the number of errored frame MAC sublayer.
seconds that occur during a
specified period of time reaches
a specified upper limit on a
device's interface, the device
generates an errored frame
second summary event,
advertises the event to the
remote device, and sends a trap
to the NMS.
Fault Notification
After the OAM discovery phase finishes, two EFM entities at both ends of an EFM
connection exchange Information OAMPDUs to monitor link connectivity. If traffic is
interrupted due to a remote device failure, the remote EFM entity sends an Information
OAMPDU carrying an event listed in Table 1-69 to the local EFM entity. After receiving the
notification, the local EFM entity sends a trap to the NMS. An administrator can view the trap
on the NMS to determine link status and take measures to rectify the fault.
Remote Loopback
Figure 1-234 demonstrates the principles of remote loopback. When a local interface sends
non-OAMPDUs to a remote interface, the remote interface loops the non-OAMPDUs back to
the local interface, not to the destination addresses of the non-OAMPDUs. This process is
called remote loopback. An EFM connection must be established to implement remote
loopback.
A device enabled with remote loopback discards all data frames except OAMPDUs, causing a
service interruption. To prevent impact on services, use remote loopback to check link
connectivity and quality before a new network is used or after a link fault is rectified.
The local device calculates communication quality parameters such as the packet loss ratio on
the current link based on the numbers of sent and received packets. Figure 1-235 shows the
remote loopback process.
If the local device attempts to stop remote loopback, it sends a message to instruct the remote
device to disable remote loopback. After receiving the message, the remote device disables
remote loopback.
If remote loopback is left enabled, the remote device keeps looping back service data, causing
a service interruption. To prevent this issue, a capability can be configured to disable remote
loopback automatically after a specified timeout period. After the timeout period expires, the
local device automatically sends a message to instruct the remote device to disable remote
loopback.
Maintenance Domain
MDs are discrete areas within which connectivity fault detection is enabled. The boundary of
an MD is determined by MEPs configured on interfaces. An MD is identified by an MD
name.
To help locate faults, MDs are divided into levels 0 through 7. A larger value indicates a
higher level, and an MD covers a larger area. One MD can be tangential to another MD.
Tangential MDs share a single device and this device has one interface in each of the MDs. A
lower level MD can be nested in a higher level MD. An MD must be fully nested in another
MD, and the two MDs cannot overlap. A higher level MD cannot be nested in a lower level
MD.
Classifying MDs based on levels facilitates fault diagnosis. MD2 is nested in MD1 on the
network shown in Figure 1-236. If a fault occurs in MD1, PE2 through PE6 and all the links
between the PEs are checked. If no fault is detected in MD2, PE2, PE3, and PE4 are working
properly. This means that the fault is on PE5, PE6, or PE7 or on a link between these PEs.
In actual network scenarios, a nested MD can monitor the connectivity of the higher level MD
in which it is nested. Level settings allow 802.1ag packets to transparently travel through a
nested MD. For example, on the network shown in Figure 1-236, MD2 with the level set to 3
is nested in MD1 with the level set to 6. 802.1ag packets must transparently pass through
MD2 to monitor the connectivity of MD1. The level setting allows 802.1ag packets to pass
through MD2 to monitor the connectivity of MD1 but prevents 802.1ag packets that monitor
MD2 connectivity from passing through MD1. Setting levels for MDs helps locate faults.
802.1ag packets are exchanged and CFM functions are implemented based on MDs. Properly
planned MDs help a network administrator locate faults.
Default MD
A single default MD with the highest priority can be configured for each device according to
Std 802.1ag-2007.
On the network shown in Figure 1-237, if default MDs with the same level as the higher level
MDs are configured on devices in lower level MDs, MIPs are generated based on the default
MDs to reply to requests sent by devices in higher level MDs. CFM detects topology changes
and monitors the connectivity of both higher and lower level MDs.
The default MD must have a higher level than all MDs to which MEPs configured on the
local device belong. The default MD must also be of the same level as a higher level MD. The
default MD transmits high level request messages and generates MIPs to send responses.
Standard 802.1ag-2007 states that one default MD can be configured on each device and
associated with multiple virtual local area networks (VLANs). VLAN interfaces can
automatically generate MIPs based on the default MDs and a creation rule.
Maintenance Association
Multiple MAs can be configured in an MD as needed. Each MA contains MEPs. An MA is
uniquely identified by an MD name and an MA name.
An MA serves a specific service such as VLAN. A MEP in an MA sends packets carrying tags
of the specific service and receives packets sent by other MEPs in the MA.
Outward-facing MEP: sends packets out of the interface on which the MEP is
configured.
Figure 1-238 shows inward- and outward-facing MEPs.
MIPs are separately calculated in each service instance such as a VLAN. In a single service
instance, MAs in MDs with different levels have the same VLAN ID but different levels.
For each service instance of each interface, the device attempts to calculate a MIP from the
lowest level MEP based on the rules listed in Table 1-70 and the following conditions:
Each MD on a single interface has a specific level and is associated with multiple
creation rules. The creation rule with the highest priority applies. An explicit rule has a
higher priority than a default rule, and a default rule takes precedence over a none rule.
The level of a MIP must be higher than any MEP on the same interface.
An explicit rule applies to an interface only when MEPs are configured on the interface.
A single MIP can be generated on a single interface. If multiple rules for generating
MIPs with different levels can be used, a MIP with the lowest level is generated.
MIP creation rules help detect and locate faults by level.
For example, CCMs are sent to detect a fault in a level 7 MD on the network shown in Figure
1-239. Loopback or linktrace is used to locate the fault in the link between MIPs that are in a
level 5 MD. This process is repeated until the faulty link or device is located.
The following example illustrates how to create a MIP based on a default rule defined in IEEE
Std 802.1ag-2007.
On the network shown in Figure 1-240, MD1 through MD5 are nested in MD7, and MD2
through MD5 are nested in MD1. MD7 has a higher level than MD1 through MD5, and MD1
has a higher level than MD2 through MD5. Multiple MEPs are configured on Device A in
MD1, and the MEPs belong to MDs with different levels.
A default rule is configured on Device A to create a MIP in MD1. The procedure for creating
the MIP is as follows:
1. Device A compares MEP levels and finds the MEP at level 5, the highest level. The
MEP level is determined by the level of the MD to which the MEP belongs.
2. Device A selects the MD at level 6, which is higher than the MEP of level 5.
3. Device A generates a MIP at level 6.
If MDs at level 6 or higher do not exist, no MIP is generated.
If MIPs at level 1 already exist on Device A, MIPs at level 6 cannot be generated.
Hierarchical MP Maintenance
MEPs and MIPs are maintenance points (MPs). MPs are configured on interfaces and belong
to specific MAs shown in Figure 1-241.
The scope of maintenance performed and the types of maintenance services depend on the
need of the organizations that use carrier-class Ethernet services. These organizations include
leased line users, service providers, and network carriers. Users purchase Ethernet services
from service providers, and service providers use their networks or carrier networks to
provide E2E Ethernet services. Carriers provide transport services.
Figure 1-242 shows locations of MEPs and MIPs and maintenance domains for users, service
providers, and carriers.
Operator 1, operator 2, the service provider, and the customer use MDs with levels 3, 4, 5, and
6, respectively. A higher MD level indicates a larger MD.
Field Description
0x01 Continuity check message Used for monitoring E2E link connectivity.
(CCM)
1.5.7.3.2 Background
IP-layer mechanisms, such as Simple Network Management Protocol (SNMP), IP ping, and
IP traceroute, are used to manage network-wide services, detect faults, and monitor
performance on traditional Ethernet networks. These mechanisms are unsuitable for
client-layer E2E Ethernet operation and management.
CFM supports service management, fault detection, and performance monitoring on the E2E
Ethernet network. In Figure 1-244:
A network is logically divided into maintenance domains (MDs). For example, network
devices that a single Internet service provider (ISP) manages are in a single MD to
distinguish between ISP and user networks.
Two maintenance association end points (MEPs) are configured on both ends of a
management network segment to be maintained to determine the boundary of an MD.
Maintenance association intermediate points (MIPs) can be configured as needed. A
MEP initiates a test request, and the remote MEP (RMEP) or MIP responds to the request.
This process provides information about the management network segment to help detect
faults.
CFM supports level-specific MD management. An MD at a given level can manage MDs at
lower levels but cannot manage an MD at a higher level than its own. Level-specific MD
management is used to maintain a service flow based on level-specific MDs and different
types of service flows in an MD.
Continuity Check
CC monitors the connectivity of links between MEPs. A MEP periodically sends multicast
continuity check messages (CCMs) to an RMEP in the same MA. If an RMEP does not
receive a CCM within a period 3.5 times the interval at which CCMs are sent, the RMEP
considers the path between itself and the MEP faulty.
Figure 1-245 CC
forward this CCM. This process prevents a lower level CCM from being sent to a higher
level MD.
Loopback
Loopback is also called 802.1ag MAC ping. Similar to IP ping, loopback monitors the
connectivity of a path between a local MEP and an RMEP.
A MEP initiates an 802.1ag MAC ping test to monitor the reachability of an RMEP or MIP
destination address. The MEP, MIP, and RMEP have the same level and they can share an MA
or be in different MAs. The MEP sends Loopback messages (LBMs) to the RMEP or MIP.
After receiving the messages, the RMEP or MIP replies with loopback replies (LBRs).
Loopback helps locate a faulty node because a faulty node cannot send an LBR in response to
an LBM. LBMs and LBRs are unicast packets.
The following example illustrates the implementation of loopback on the network shown in
Figure 1-246.
CFM is configured to monitor a path between PE1 (MEP1) and PE4 (MEP2). The MD level
of these MEPs is 6. A MIP with a level of 6 is configured on PE2 and PE3. If a fault is
detected in a link between PE1 and PE4, loopback can be used to locate the fault. Figure
1-247 illustrates the loopback process.
MEP1 can measure the network delay based on 802.1ag MAC ping results or the frame loss
ratio based on the difference between the number of LBMs and the number of LBRs.
Linktrace
Linktrace is also called 802.1ag MAC trace. Similar to IP traceroute, linktrace identifies a
path between two MEPs.
A MEP initiates an 802.1ag MAC trace test to monitor a path to an RMEP or MIP destination
address. The MEP, MIP, and RMEP have the same level and they can share an MA or be in
different MAs. A source MEP constructs and sends a Linktrace message (LTM) to a
destination MEP. After receiving this message, each MIP forwards it and replies with a
linktrace reply (LTR). Upon receipt, the destination MEP replies with an LTR and does not
forward the LTM. The source MEP obtains topology information about each hop on the path
based on the LTRs. LTMs are multicast packets and LTRs are unicast packets.
The following example illustrates the implementation of linktrace on the network shown in
Figure 1-248.
1. MEP1 sends MEP2 an LTM carrying a time to live (TTL) value and the MAC address of
the destination MEP2.
2. After the LTM arrives at MIP1, MIP1 reduces the TTL value in the LTM by 1 and
forwards the LTM if the TTL is not zero. MIP1 then replies with an LTR to MEP1. The
LTR carries forwarding information and the TTL value carried by the LTM when MIP1
received it.
3. After the LTM reaches MIP2 and MEP2, the process described above for MIP1 is
repeated for MIP2 and MEP2. In addition, MEP2 finds that its MAC address is the
destination address carried in the LTM and therefore does not forward the LTM.
4. The LTRs from MIP1, MIP2, and MEP2 provide MEP1 with information about the
forwarding path between MEP1 and MEP2.
If a fault occurs on the path between MEP1 and MEP2, MEP2 or a MIP cannot receive
the LTM or reply with an LTR. MEP1 can locate the faulty node based on such a
response failure. For example, if the link between MEP1 and MIP2 works properly but
the link between MIP2 and MEP2 fails, MEP1 can receive LTRs from MIP1 and MIP2
but fails to receive a reply from MEP2. MEP1 then considers the path between MIP2 and
MEP2 faulty.
Alarm Types
If CFM detects a fault in an E2E link, it triggers an alarm and sends the alarm to the network
management system (NMS). A network administrator uses the information to troubleshoot.
Table 1-74 describes alarms supported by CFM.
Alarm Anti-jitter
Multiple alarms and clear alarms may be generated on an unstable network enabled with CC.
These alarms consume system resources and deteriorate system performance. An RMEP
activation time can be set to prevent false alarms, and an alarm anti-jitter time can be set to
limit the number of alarms generated.
Function Description
Setting
RMEP Prevents false alarms. A local MEP with the ability to receive CCMs can
activation accept CCMs only after the RMEP activation time elapses.
time
Alarm If a MEP detects a connectivity fault,
anti-jitter it sends an alarm to the NMS after the anti-jitter time elapses.
time
it does not send an alarm if the fault is rectified before the anti-jitter time
elapses.
Alarm Suppression
If different types of faults trigger more than one alarm, CFM alarm suppression allows the
alarm with the highest level to be sent to the NMS. If alarms persist after the alarm with the
highest level is cleared, the alarm with the second highest level is sent to the NMS. The
process repeats until all alarms are cleared.
The principles of CFM alarm suppression are as follows:
Alarms with high levels require immediate troubleshooting.
A single fault may trigger alarms with different levels. After the alarm with the highest
level is cleared, alarms with lower levels may also be cleared.
Figure 1-249 shows typical Y.1731 networking. Y.1731 performance monitoring tools can be
used to assess the quality of the purchased Ethernet tunnel services or help a carrier conduct
regular service level agreement (SLA) monitoring.
Function Overview
Y.1731 can manage fault information and monitor performance.
Fault management functions include continuity check (CC), loopback (LB), and linktrace
(LT). The principles of Y.1731 fault management are the same as those of CFM fault
management.
Performance monitoring functions include single- and dual-ended frame loss
measurement, one- and two-way frame delay measurement, alarm indication signal
(AIS), Ethernet test function (ETH-Test), Single-ended Synthetic Loss Measurement
(SLM), Ethernet lock signal function (ETH-LCK), ETH-BN on virtual private LAN
service (VPLS) networks, virtual leased line (VLL) networks, and virtual local area
networks (VLANs).Kompella VPLS and VLL scenarios support AIS only.
One-way Measures the network To measure the link delay, select either
Frame Delay delay on a unidirectional one- or two-way frame delay
Measurement link between MEPs. measurement:
One-way frame delay measurement can
Two-way Measures the network
Frame Delay delay on a bidirectional be used to measure the delay on a
Measurement link between MEPs. unidirectional link between a MEP and
its RMEP. The MEP must synchronize
its time with its RMEP.
Two-way frame delay measurement
can be used to measure the delay on a
bidirectional link between a MEP and
its RMEP. The MEP does not need to
synchronize its time with its RMEP.
ETH-LCK Informs the server-layer The ETH-LCK function must work with
(sub-layer) MEP of the ETH-Test function.
administrative locking and
the interruption of traffic
destined for the MEP in the
inner maintenance domain
(MD).
Single-ended Collects frame loss Single-ended synthetic frame LM is used
Synthetic Loss statistics on to collect accurate frame loss statistics on
Measurement point-to-multipoint or point-to-multipoint links.
(SLM) E-Trunk links to monitor
link quality.
ETH-LM
Ethernet frame loss measurement (ETH-LM) enables a local MEP and its RMEP to exchange
ETH-LM frames to collect frame loss statistics on E2E links. ETH-LM modes are classified
as near- or far-end ETH-LM.
Near-end ETH-LM applies to an inbound interface, and far-end ETH-LM applies to an
outbound interface on a MEP. ETH-LM counts the number of errored frame seconds to
determine the duration during which a link is unavailable.
ETH-LM supports the following methods:
Single-ended frame loss measurement
This method measures frame loss proactively or on demand.
− On-demand measurement collects single-ended frame loss statistics at a time or a
specific number of times for diagnosis.
− Proactive measurement collects single-ended frame loss statistics periodically.
A local MEP sends a loss measurement message (LMM) carrying an ETH-LM request to
its RMEP. After receiving the request, the RMEP responds with a loss measurement
reply (LMR) carrying an ETH-LM response. Figure 1-250 illustrates the process for
single-ended frame loss measurement.
After single-ended frame loss measurement is enabled, a MEP on PE1 sends an RMEP
on PE2 an ETH-LMM carrying an ETH-LM request. The MEP then receives an
ETH-LMR message carrying an ETH-LM response from the RMEP on PE2. The
ETH-LMM carries a local transmit counter TxFCl (with a value of TxFCf), indicating
the time when the message was sent by the local MEP. After receiving the ETH-LMM,
PE2 replies with an ETH-LMR message, which carries the following information:
− TxFCf: copied from the ETH-LMM
− RxFCf: value of the local counter RxFCl at the time of ETH-LMM reception
− TxFCb: value of the local counter TxFCl at the time of ETH-LMM transmission
After receiving the ETH-LMR message, PE1 measures near- and far-end frame loss
based on the following values:
− Received ETH-LMR message's TxFCf, RxFCf, and TxFCb values and local counter
RxFCl value that is the time when this ETH-LMR message was received. These
values are represented as TxFCf[tc], RxFCf[tc], TxFCb[tc], and RxFCl[tc].
tc is the time when this ETH-LMR message was received.
− Previously received ETH-LMR message's TxFCf, RxFCf, and TxFCb values and
local counter RxFCl value that is the time when this ETH-LMR message was
received. These values are represented as TxFCf[tp], RxFCf[tp], TxFCb[tp], and
RxFCl[tp].
tp is the time when the previous ETH-LMR message was received.
Far-end frame loss = |TxFCf[tc] - TxFCf[tp]| - |RxFCf[tc] - RxFCf[tp]|
Near-end frame loss = |TxFCb[tc] - TxFCb[tp]| - |RxFCl[tc] - RxFCl[tp]|
Service packets are prioritized based on 802.1p priorities and are transmitted using
different policies. Traffic passing through the P device on the network shown in Figure
1-251 carries 802.1p priorities of 1 and 2.
Single-ended frame loss measurement is enabled on PE1 to send traffic with a priority of
1 to measure frame loss on a link between PE1 and PE2. Traffic with a priority of 2 is
also sent. After receiving traffic with priorities of 1 and 2, the P device forwards traffic
with a higher priority, delaying the arrival of traffic with a priority of 1 at PE2. As a
result, the frame loss ratio is inaccurate.
After dual-ended frame loss measurement is configured, each MEP periodically sends a
CCM carrying a request to its RMEP. After receiving the CCM, the RMEP collects near-
and far-end frame loss statistics but does not forward the message. The CCM carries the
following information:
− TxFCf: value of the local counter TxFCl at the time of CCM transmission
− RxFCb: value of the local counter RxFCl at the time of the reception of the last
CCM
− TxFCb: value of TxFCf in the last received CCM
PE1 uses received information to measure near- and far-end frame loss based on the
following values:
− Received CCM's TxFCf, RxFCb, and TxFCb values and local counter RxFCl value
that is the time when this CCM was received. These values are represented as
TxFCf[tc], RxFCb[tc], TxFCb[tc], and RxFCl[tc].
tc is the time when this CCM was received.
− Previously received CCM's TxFCf, RxFCb, and TxFCb values and local counter
RxFCl value that is the time when this CCM was received. These values are
represented as TxFCf[tp], RxFCb[tp], TxFCb[tp], and RxFCl[tp].
tp is the time when the previous CCM was received.
Far-end frame loss = |TxFCb[tc] - TxFCb[tp]| - |RxFCb[tc] - RxFCb[tp]|
Near-end frame loss = |TxFCf[tc] - TxFCb[tp]| - |RxFCl[tc] - RxFCl[tp]|
ETH-DM
Delay measurement (DM) measures the delay and its variation. A MEP sends its RMEP a
message carrying ETH-DM information and receives a response message carrying ETH-DM
information from its RMEP.
ETH-DM supports the following modes:
One-way frame delay measurement
A MEP sends its RMEP a 1DM message carrying one-way ETH-DM information. After
receiving this message, the RMEP measures the one-way frame delay and its variation.
One-way frame delay measurement can be implemented only after the MEP
synchronizes the time with its RMEP. The delay variation can be measured regardless of
whether the MEP synchronizes the time with its RMEP. If a MEP synchronizes its time
with its RMEP, the one-way frame delay and its variation can be measured. If the time is
not synchronized, only the one-way delay variation can be measured.
One-way frame delay measurement can be implemented in either of the following
modes:
− On-demand measurement: calculates the one-way frame delay at a time or a
specific number of times for diagnosis.
− Proactive measurement: calculates the one-way frame delay periodically.
Figure 1-253 illustrates the process for one-way frame delay measurement.
One-way frame delay measurement is implemented on an E2E link between a local MEP
and its RMEP. The local MEP sends 1DMs to the RMEP and then receives replies from
the RMEP. After one-way frame delay measurement is configured, a MEP periodically
sends 1DMs carrying TxTimeStampf (the time when the 1DM was sent). After receiving
the 1DM, the RMEP parses TxTimeStampf and compares this value with RxTimef (the
time when the DM frame was received). The RMEP calculates the one-way frame delay
based on these values using the following equation:
Frame delay = RxTimef - TxTimeStampf
The frame delay can be used to measure the delay variation.
A delay variation is an absolute difference between two delays.
802.1p priorities carried in service packets are used to prioritize services. Traffic passing
through the P device on the network shown in Figure 1-254 carries 802.1p priorities of 1
and 2.
One-way frame delay measurement is enabled on PE1 to send traffic with a priority of 1
to measure the frame delay on a link between PE1 and PE2. Traffic with a priority of 2 is
also sent. After receiving traffic with priorities of 1 and 2, the P device forwards traffic
with a higher priority, delaying the arrival of traffic with a priority of 1 at PE2. As a
result, the frame delay calculated on PE2 is inaccurate.
802.1p priority-based one-way frame delay measurement can be enabled to obtain
accurate results.
DMMs carrying TxTimeStampf (the time when the DMM was sent). After receiving the
DMM, the RMEP replies with a DMR message. This message carries RxTimeStampf
(the time when the DMM was received) and TxTimeStampb (the time when the DMR
was sent). The value in every field of the DMM is copied to the DMR, with the
exception that the source and destination MAC addresses were interchanged. Upon
receipt of the DMR message, the MEP calculates the two-way frame delay using the
following equation:
Frame delay = (RxTimeb - TxTimeStampf) - (TxTimeStampb - RxTimeStampf)
The frame delay can be used to measure the delay variation.
A delay variation is an absolute difference between two delays.
802.1p priorities carried in service packets are used to prioritize services. Traffic passing
through the P device on the network shown in Figure 1-256 carries 802.1p priorities of 1
and 2.
Two-way frame delay measurement is enabled on PE1 to send traffic with a priority of 1
to measure the frame delay on a link between PE1 and PE2. Traffic with a priority of 2 is
also sent. After receiving traffic with priorities of 1 and 2, the P device forwards traffic
with a higher priority, delaying the arrival of traffic with a priority of 1 at PE2. As a
result, the frame delay calculated on PE2 is inaccurate.
802.1p priority-based two-way frame delay measurement can be enabled to obtain
accurate results.
AIS
AIS is a protocol used to transmit fault information.
A MEP is configured in MD1 with a level of 6 on each of CE1 and CE2 access interfaces on
the user network shown in Figure 1-257. A MEP is configured in MD2 with a level of 3 on
each of PE1 and PE2 access interfaces on a carrier network.
If CFM detects a fault in the link between AIS-enabled PEs, CFM sends AIS packet data
units (PDUs) to CEs. After receiving the AIS PDUs, the CEs suppress alarms,
minimizing the impact of a large number of alarms on a network management system
(NMS).
After the link between the PEs recovers, the PEs stop sending AIS PDUs. CEs do not
receive AIS PDUs during a period of 3.5 times the interval at which AIS PDUs are sent.
Therefore, the CEs cancel the alarm suppression function.
ETH-Test
ETH-Test is used to perform one-way on-demand in-service or out-of-service diagnostic tests
on the throughput, frame loss, and bit errors.
The implementation of these tests is as follows:
Verifying throughput and frame loss: Throughput means the maximum bandwidth of a
link without packet loss. When you use ETH-Test to verify the throughput, a MEP sends
frames with ETH-Test information at a preconfigured traffic rate and collects frame loss
statistics for a specified period. If the statistical results show that the number of sent
frames is greater than the number of received frames, frame loss occurs. The MEP sends
frames at a lower rate until no frame loss occurs. The traffic rate measured at the time
when no packet loss occurs is the throughput of this link.
Verifying bit errors: ETH-Test is implemented by verifying the cyclic redundancy code
(CRC) of the Test TLV field carried in ETH-Test frames. For the ETH-Test
implementation, four types of test patterns can be specified in the test TLV field: Null
signal without CRC-32, Null signal with CRC-32, PRBS 2-31-1 without CRC-32, and
PRBS 2-31-1 with CRC-32. Null signal indicates all 0s signal. PRBS, pseudo random
binary sequence, is used to simulate white noise. A MEP sends ETH-Test frames
carrying the calculated CRC value to the RMEP. After receiving the ETH-Test frames,
the RMEP recalculates the CRC value. If the recalculated CRC value is different from
the CRC value carried in the sent ETH-Test frames, bit errors occur.
ETH-Test provides two types of test modes: out-of-service ETH-Test and in-service
ETH-Test:
Out-of-service ETH-Test mode: Client data traffic is interrupted in the diagnosed entity.
To resolve this issue, the out-of-service ETH-Test function must be used together with
the ETH-LCK function.
In-service ETH-Test mode: Client data traffic is not interrupted, and the frames with the
ETH-Test information are transmitted using part of bandwidths.
ETH-LCK
ETH-LCK is used for administrative locking on the MEP in the outer MD with a higher level
than the inner MD, that is, preventing CC alarms from being generated in the outer MD.
When implementing ETH-LCK, a MEP in the inner MD sends frames with the ETH-LCK
information to the MEP in the outer MD. After receiving the frames with the ETH-LCK
information, the MEP in the outer MD can differentiate the alarm suppression caused by
administrative locking from the alarm suppression caused by a fault in the inner MD (the AIS
function).
To suppress CC alarms from being generated in the outer MD, ETH-LCK is implemented
with out-of-service ETH-Test. A MEP in the inner MD with a lower level initiates ETH-Test
by sending an ETH-LCK frame to a MEP in the outer MD. Upon receipt of the ETH-LCK
frame, the MEP in the outer MDsuppresses all CC alarms immediately and reports an
ETH-LCK alarm indicating administrative locking. Before out-of-service ETH-Test is
complete, the MEP in the inner MD sends ETH-LCK frames to the MEP in the outer MD.
After out-of-service ETH-Test is complete, the MEP in the inner MD stops sending
ETH-LCK frames. If the MEP in the outer MD does not receive ETH-LCK frames for a
period 3.5 times as long as the specified interval, it releases the alarm suppression and reports
a clear ETH-LCK alarm.
As shown in Figure 1-258, MD2 with the level of 3 is configured on PE1 and PE2; MD1 with
the level of 6 is configured on CE1 and CE2. When PE1's MEP1 sends out-of-service
ETH-Test frames to PE2's MEP2, MEP1 also sends ETH-LCK frames to CE1's MEP11 and
CE2's MEP22 separately to suppress MEP11 and MEP22 from generating CC alarms. When
MEP1 stops sending out-of-service ETH-Test frames, it also stops sending ETH-LCK frames.
If MEP11 and MEP22 do not receive ETH-LCK frames for a period 3.5 times as long as the
specified interval, they release the alarm suppression.
Single-ended ETH-SLM
SLM measures frame loss using synthetic frames instead of data traffic. When implementing
SLM, the local MEP exchanges frames containing ETH-SLM information with one or more
RMEPs.
Figure 1-259 demonstrates the process of single-ended SLM:
1. The local MEP sends ETH-SLM request frames to the RMEPs.
2. After receiving the ETH-SLM request frames, the RMEPs send ETH-SLM reply frames
to the local MEP.
A frame with the single-ended ETH-SLM request information is called an SLM, and a frame
with the single-ended ETH-SLM reply information is called an SLR. SLM frames carry SLM
protocol data units (PDUs), and SLR frames carry SLR PDUs.
Single-ended SLM and single-ended frame LM are differentiated as follows: On the
point-to-multipoint network shown in Figure 1-259, inward MEPs are configured on PE1's
and PE3's interfaces, and single-ended frame LM is performed on the PE1-PE3 link. Traffic
coming through PE1's interface is destined for both PE2 and PE3, and single-ended frame LM
will collect frame loss statistics for all traffic, including the PE1-to-PE2 traffic. As a result, the
collected statistics are not accurate. Unlike singled-ended frame LM, single-ended SLM
collects frame loss statistics only for the PE1-to-PE3 traffic, which is more accurate.
When implementing single-ended SLM, PE1 sends SLM frames to PE3 and receives SLR
frames from PE3. SLM frames contain TxFCf, the value of TxFCl (frame transmission
counter), indicating the frame count at the transmit time. SLR frames contain the following
information:
TxFCf: value of TxFCl (frame transmission counter) indicating the frame count on PE1
upon the SLM transmission
TxFCb: value of RxFC1 (frame receive counter) indicating the frame count on PE3 upon
the SLR transmission
After receiving the last SLR frame during a measurement period, a MEP on PE1 measures the
near-end and far-end frame loss based on the following values:
Last received SLR's TxFCf and TxFCb, and value of RxFC1 (frame receive counter)
indicating the frame count on PE1 upon the SLR reception. These values are represented
as TxFCf[tc], TxFCb[tc], and RxFCl[tc].
tc indicates the time when the last SLR frame was received during the measurement
period.
Previously received SLR's TxFCf and TxFCb, and value of RxFC1 (frame receive
counter) indicating the frame count on PE1 upon the SLR reception. These values are
represented as TxFCf[tp], TxFCb[tp], and RxFCl[tp].
tp indicates the time when the last SLR frame was received during the previous
measurement period.
Far-end frame loss = |TxFCf[tc] – TxFCf[tp]| – |TxFCb[tc] – TxFCb[tp]|
Near-end frame loss = |TxFCb[tc] – TxFCb[tp]| – |RxFCf[tc] – RxFCf[tp]|
On a network, each packet carries the IEEE 802.1p field, indicating its priority. According to
packet priority, different QoS policies will be applied. On the network shown in Figure 1-260,
the PE1-to-PE3 traffic has two priorities: 1 and 2, as indicated by the IEEE 802.1p field.
When implementing single-ended SLM for traffic over the PE1-PE3 link, PE1 sends SLM
frames with varied priorities and checks the frame loss. Based on the check result, the
network administrator can adjust the QoS policy for the link.
ETH-BN
Ethernet bandwidth notification (ETH-BN) enables server-layer MEPs to notify client-layer
MEPs of the server layer's connection bandwidth when routing devices connect to microwave
devices. The server-layer devices are microwave devices, which dynamically adjust the
bandwidth according to the prevailing atmospheric conditions. The client-layer devices are
routing devices. Routing devices can only function as ETH-BN packets' receive ends and
must work with microwave devices to implement this function.
As shown in Figure 1-261, server-layer MEPs are configured on the server-layer devices, and
the ETH-BN sending function is enabled. The levels of client-layer MEPs must be specified
for the server-layer MEPs when the ETH-BN sending function is enabled. Client-layer MEPs
are configured on the client-layer devices, and the ETH-BN receiving function is enabled. The
levels of the client-layer MEPs are the same as those specified for the server-layer MEPs.
If the ETH-BN function has been enabled on the server-layer devices Device2 and
Device3 and the bandwidth of the server-layer devices' microwave links decreases, the
server-layer devices send ETH-BN packets to the client-layer devices (Device1 and
Device4). After receiving the ETH-BN packets, the client-layer MEPs can use bandwidth
information in the packets to adjust service policies, for example, to reduce the rate of
traffic sent to the degraded links.
When the server-layer devices' microwave links work properly, whether to send
ETH-BN packets is determined by the configuration of the server-layer devices. When
the server-layer microwave devices stop sending ETH-BN packets, the client-layer
devices do not receive any ETH-BN packets. The ETH-BN data on the client-layer
devices is aged after 3.5 times the interval at which ETH-BN packets are sent.
When planning ETH-BN, you must check that the service burst traffic is consistent with a device's
buffer capability.
Usage Scenario
Y.1731 supports performance statistics collection on both end-to-end and end-to-multi-end
links.
End-to-end performance statistics collection
On the network shown in Figure 1-262, Y.1731 collects statistics about the end-to-end link
performance between the CE and PE1, between PE1 and PE2, or between the CE and PE3.
End-to-multi-end performance statistics collection
On the network shown in Figure 1-263, user-to-network traffic from different users traverses
CE1 and CE2 and is converged on CE3. CE3 forwards the converged traffic to the UPE.
Network-to-user traffic traverses CE3, and CE3 forwards the traffic to CE1 and CE2.
When Y.1731 is used to collect statistics about the link performance between the CE and the
UPE, end-to-end performance statistics collection cannot be implemented. This is because
only one inbound interface (on the UPE) sends packets but two outbound interfaces (on CE1
and CE2) receive the packets. In this case, statistics on the outbound interfaces fail to be
collected. To resolve this issue, end-to-multi-end performance statistics collection can be
implemented.
The packets carry the MAC address of CE1 or CE2. The UPE identifies the outbound
interface based on the destination MAC address carried in the packets and collects end-to-end
performance statistics.
Both end-to-multi-end and end-to-end performance statistics collection applies to VLL, VPLS,
and VLAN scenarios and has the same statistics collection principles.
Figure 1-264 Fault information advertisement between EFM and detection modules
The following example illustrates fault information advertisement between EFM and
detection modules over a path CE5 -> CE4 -> CE1-> PE2 -> PE4 on the network shown in
Table 1-77.
Table 1-77 Fault information advertisement between EFM and detection modules
Function Issue to Be Resolved Solution
Deployment
EFM is used to Although EFM detects a The EFM module can be
monitor the direct fault, EFM cannot notify associated with the CFM module.
link between CE1 PE6 of the fault. As a result, If the EFM module detects a
and PE2, and CFM PE6 still forwards network fault, it instructs the
is used to monitor traffic to PE2, causing a OAMMGR module to notify
the link between traffic interruption. the CFM module of the fault.
PE2 and PE6. Although CFM detects a If the CFM module detects a
Figure 1-265 Fault information advertisement between EFM and application modules
Table 1-78 describes fault information advertisement between EFM and VRRP modules.
Table 1-78 Fault information advertisement between EFM and VRRP modules
Function Issue to Be Resolved Solution
Deployment
A VRRP If links connected to a VRRP To help prevent data loss, the
backup group backup group fail, VRRP packets VRRP module can be associated
is configured to cannot be sent to negotiate the with the EFM module. If a fault
determine the master/backup status. A backup occurs, the EFM module notifies
master/backup VRRP device preempts the the VRRP module of the fault.
status of Master state after a period of After receiving the notification,
provider three times the interval at which the VRRP module triggers a
edges-aggregat VRRP packets are sent. As a master/backup VRRP switchover.
ion result, data loss occurs.
(PE-AGGs).
EFM is used to
monitor links
between the
UPE and
PE-AGGs.
Figure 1-266 Networking for fault information advertisement between CFM and detection
modules
The following example illustrates fault information advertisement between CFM and
detection modules over a path UPE1 -> PE2 -> PE4 -> PE6 -> PE8 on the network shown in
Table 1-79.
Table 1-79 Fault information advertisement between CFM and detection modules
Table 1-80 describes fault information advertisement between CFM and VRRP modules.
Table 1-80 Fault information advertisement between CFM and VRRP modules
1.5.7.6 Applications
1.5.7.6.1 Ethernet OAM Applications on a MAN
EFM, CFM, and Y.1731 can be combined to provide E2E Ethernet OAM solutions,
implementing E2E Ethernet service management.
Figure 1-269 shows a typical MAN network. The following example illustrates Ethernet
OAM applications on a MAN.
EFM is used to monitor P2P direct links between a digital subscriber line access
multiplexer (DSLAM) and a user-end provider edge (UPE) or between a LAN switch
(LSW) and a UPE. If EFM detects errored frames, codes, or frame seconds, it sends
alarms to the network management system (NMS) to provide information for a network
administrator. EFM uses the loopback function to assess link quality.
CFM is used to monitor E2E links between a UPE and an NPE or between a UPE and a
provider edge-aggregation (PE-AGG). A network planning engineer groups the devices
of each Internet service provider (ISP) into an MD and maps a type of service to an MA.
A network maintenance engineer enables maintenance points to exchange CCMs to
A mobile backhaul network shown in Figure 1-270 consists of a transport network between a
cell site gateway (CSG) and remote service gateways (RSGs) and a wireless network between
NodeBs/eNodeBs and the CSG. Carriers operate the transport and wireless networks
separately. Therefore, traffic transmitted on the transport network of one carrier is invisible to
devices on the wireless network of another carrier.
Ethernet OAM can be used on the transport and wireless networks to identify and locate
faults.
EFM monitors Layer 2 links between a NodeB/eNodeB and CSG1.
− EFM is used to monitor the connectivity of links between a NodeB/eNodeB and
CSG1 or between RNCs and RSGs.
− EFM detects errored codes, frames, and frame seconds on links between a
NodeB/eNodeB and CSG1 and between RNCs and RSGs. If the number of errored
codes, frames, or frame seconds exceeds a configured threshold, an alarm is sent to
the NMS. A network administrator is notified of link quality deterioration and can
assess the risk of adverse impact on voice traffic.
− Loopback is used to monitor the quality of voice links between a NodeB/eNodeB
and CSG1 or between RNCs and RSGs.
CFM is used to locate faulty links over which E2E services are transmitted.
− CFM periodically monitors links between cell site gateway (CSG) 1 and remote site
gateways (RSGs). If CFM detects a fault, it sends an alarm to the NMS. A network
administrator analyzes alarm information and takes measures to rectify the fault.
− Loopback and linktrace are enabled on links between CSG1 and the RSGs to help
link fault diagnosis.
Y.1731 is used together with CFM to monitor link performance and voice and data traffic
quality.
Definition
Dual-device backup is a feature that ensures service traffic continuity in scenarios in which a
master/backup status negotiation protocol (for example, VRRP or E-Trunk) is deployed.
Dual-device backup enables the master device to back up service control data to the backup
device in real time. When the master device or the link directly connected to the master device
fails, service traffic quickly switches to the backup device. When the master device or the link
directly connected to the master device recovers, service traffic switches back to the master
device. Therefore, dual-device backup improves service and network reliability.
Related Concepts
If VRRP is used as a master/backup status negotiation protocol, dual-device backup involves
the following concepts:
VRRP
VRRP is a fault-tolerant protocol that groups several routers into a virtual router. If the
next hop of a host is faulty, VRRP switches traffic to another router, which ensures
communication continuity and reliability.
For details about VRRP, see the chapter "VRRP" in NE20E Feature Description -
Network Reliability.
RUI
RUI is a Huawei-specific redundancy protocol that is used to back up user information
between devices. RUI, which is carried over the Transmission Control Protocol (TCP),
specifies which user information can be transmitted between devices and the format and
amount of user information to be transmitted.
RBS
The remote backup service (RBS) is an RUI module used for inter-device backup. A
service module uses the RBS to synchronize service control data from the master device
to the backup device. When a master/backup VRRP switchover occurs, service traffic
quickly switches to a new master device.
RBP
The remote backup profile (RBP) is a configuration template that provides a unified user
interface for dual-device backup configurations.
If E-Trunk is used as a master/backup status negotiation protocol, dual-device backup
involves the following concept:
E-Trunk
E-Trunk implements inter-device link aggregation, providing device-level reliability.
E-Trunk aggregates data links of multiple devices to form a link aggregation group
(LAG). If a link or device fails, services are automatically switched to the other available
links or devices in the E-Trunk, improving link and device-level reliability.
For details about E-Trunk, see "E-Trunk" in NE20E Feature Description - LAN Access
and MAN Access.
Purpose
In traditional service scenarios, all users use a single device to access a network. Once the
device or the link directly connected to the device fails, all user services are interrupted, and
the service recovery time is uncertain. To resolve this issue, deploy dual-device backup to
enable the master device to back up service control data to the backup device in real time.
The NE20E supports only dual-device hot backup for Address Resolution Protocol (ARP)
services, also called dual-device ARP hot backup.
Dual-device ARP hot backup enables the master device to back up the ARP entries at the
control and forwarding layers to the backup device in real time. When the backup device
switches to a master device, it uses the backup ARP entries to generate host routing
information without needing to relearn ARP entries, ensuring downlink traffic continuity.
− Manually triggered dual-device ARP hot backup: You must manually establish a
backup platform and backup channel for the master and backup devices. In addition,
you must manually trigger ARP entry backup from the master device to the backup
device. This backup mode has complex configurations.
− Automatically enabled dual-device ARP hot backup: You need to establish only a
backup channel between the master and backup devices, and the system
automatically implements ARP entry backup. This backup mode has simple
configurations.
Dual-device IGMP snooping hot backup enables the master device to back up IGMP
snooping entries to the backup device in a master/backup E-Trunk scenario. If the master
device or the link between the master device and user fails, the backup device switches
to a master device and takes over, ensuring multicast service continuity.
Benefits
Benefits to users
− Improved user experience
Benefits to operators
Improving network reliability from the perspective of service reliability.
1.5.8.2 Applications
1.5.8.2.1 Dual-Device ARP Hot Backup
Networking Description
Dual-device ARP hot backup enables the master device to back up the ARP entries at the
control and forwarding layers to the backup device in real time. When the backup device
switches to a master device, it uses the backup ARP entries to generate host routing
information. After you deploy dual-device ARP hot backup, the new master device forwards
downlink traffic without needing to relearn ARP entries. Dual-device ARP hot backup ensures
downlink traffic continuity.
Dual-device ARP hot backup applies in both Virtual Router Redundancy Protocol (VRRP) and enhanced
trunk (E-Trunk) scenarios. This section describes the implementation of dual-device ARP hot backup in
VRRP scenarios.
Figure 1-273 shows a typical network topology in which a Virtual Router Redundancy
Protocol (VRRP) backup group is deployed. In the topology, Device A is a master device, and
Device B is a backup device. In normal circumstances, Device A forwards both uplink and
downlink traffic. If Device A or the link between Device A and the switch fails, a
master/backup VRRP switchover is triggered to switch Device B to the Master state. Device B
needs to advertise a network segment route to a device on the network side to direct downlink
traffic from the network side to Device B. If Device B has not learned ARP entries from a
device on the user side, the downlink traffic is interrupted. Device B forwards the downlink
traffic only after it learns ARP entries from a device on the user side.
Feature Deployment
To prevent downlink traffic from being interrupted because Device B does not learn ARP
entries from a device on the user side, deploy dual-device ARP hot backup on Device A and
Device B, as shown in Figure 1-274.
After the deployment, Device B backs up the ARP entries on Device A in real time. If a
master/backup VRRP switchover occurs, Device B forwards downlink traffic based on the
backup ARP entries without needing to relearn ARP entries from a device on the user side.
Networking Description
Dual-device IGMP snooping hot backup enables the master device and the backup device
synchronously generate multicast entries in real time. The IGMP protocol packets are
synchronize from the master device to the backup device, so that the same multicast
forwarding table entries can be generated on the backup device. After you deploy dual-device
ARP hot backup, the new master device forwards downlink traffic without needing to relearn
multicast forwarding table entries by IGMP snooping. Dual-device IGMP snooping hot
backup ensures downlink traffic continuity.
Figure 1-275 shows a typical network topology in which an Eth—Trunk group is deployed. In
the topology, Device A is a master device, and Device B is a backup device. In normal
circumstances, Device A forwards both uplink and downlink traffic. If Device A or the link
between Device A and the switch fails, a master/backup Eth—Trunk link switchover is
triggered to switch Device B to the Master state. Device B needs to advertise a network
segment route to a device on the network side to direct downlink traffic from the network side
to Device B. If Device B has not generated multicast forwarding entries directing traffic to the
user side, the downlink traffic is interrupted. Device B forwards the downlink traffic only
after it generates forwarding entries directing traffic to the user side.
Feature Deployment
To prevent downlink traffic from being interrupted because Device B does not generate
multicast forwarding entries directing traffic to the user side, deploy dual-device IGMP
snooping hot backup on Device A and Device B, as shown in Figure 1-276.
After the deployment, Device A and Device B generate the same multicast forwarding entries
at the same time. If a master/backup Eth-Trunk link switchover occurs, Device B forwards
downlink traffic based on the generated multicast forwarding entries without needing to
generate the entries directing traffic to the user side.
Terms
Term Definition
Dual-device A feature in which one device functions as a master device and the other
backup functions as a backup device. In normal circumstances, the master
device provides service access and the backup device monitors the
running status of the master device. When the master device fails, the
backup device switches to a master device and provides service access,
ensuring service traffic continuity.
Remote Backup A configuration template that provides a unified user interface for
Profile dual-system backup configurations.
Remote Backup An inter-device backup channel, used to synchronize data between two
Service devices so that user services can smoothly switch from a faulty device
to another device during a master/backup device switchover.
Redundancy A Huawei-proprietary protocol used by devices to back up user
User information between each other over TCP connections.
Information
Definition
A bit error refers to the deviation between a bit that is sent and the bit that is received. Cyclic
redundancy checks (CRCs) are commonly used to detect bit errors. Bit errors caused by line
faults can be corrected by rectifying the associated link faults. Random bit errors caused by
optical fiber aging or optical signal jitter, however, are more difficult to correct.
Bit-error-triggered protection switching is a reliability mechanism that triggers protection
switching based on bit error events (bit error occurrence event or correction event) to
minimize bit error impact.
Purpose
The demand for network bandwidth is rapidly increasing as mobile services evolve from
narrowband voice services to integrated broadband services, including voice and streaming
media. Meeting the demand for bandwidth with traditional bearer networks dramatically
raises carriers' operation costs. To tackle the challenges posed by this rapid
broadband-oriented development, carriers urgently need mobile bearer networks that are
flexible, low-cost, and highly efficient. IP-based mobile bearer networks are an ideal choice.
IP radio access networks (IPRANs), a type of IP-based mobile bearer network, are being
increasingly widely used.
Traditional bearer networks use retransmission or the mechanism that allows one end to
accept only one copy of packets from multiple copies of packets sent by the other end to
minimize bit error impact. IPRANs have higher reliability requirements than traditional bearer
networks when carrying broadband services. Traditional fault detection mechanisms cannot
trigger protection switching based on random bit errors. As a result, bit errors may degrade or
even interrupt services on an IPRAN.
To solve this problem, configure bit-error-triggered protection switching.
To prevent impacts on services, check whether protection links have sufficient bandwidth resources
before deploying bit-error-triggered protection switching.
Benefits
Bit-error-triggered protection switching offers the following benefits:
Protects traffic against random bit errors, meeting high reliability requirements and
improving service quality.
Enables devices to record bit error events. These records help carriers locate the nodes or
lines that have bit errors and take corrective measures accordingly.
1.5.9.2 Principles
1.5.9.2.1 Bit Error Detection
Background
Bit-error-triggered protection switching enables link bit errors to trigger protection switching
on network applications, minimizing the impact of bit errors on services. To implement
bit-error-triggered protection switching, establish an effective bit error detection mechanism
to ensure that network applications promptly detect bit errors.
Related Concepts
Bit error detection involves the following concepts:
Bit error: deviation between a bit that is sent and the bit that is received.
BER: number of bit errors divided by the total number of transferred bits during a certain
period. The BER can be considered as an approximate estimate of the probability of a bit
error occurring on any particular bit.
LSP BER: calculation result based on the BER of each node on an LSP.
If a transit node detects bit errors on a static CR-LSP or PW, the transit node uses AIS packets
to advertise the bit error status to the egress, triggering a traffic switchover on the static
CR-LSP or PW. On the network shown in Figure 1-278, a static CR-LSP is deployed from
PE1 to PE2. If the transit node P detects bit errors:
1. The P node uses AIS packets to notify PE2 of the bit error event.
2. After receiving the AIS packets, PE2 reports an AIS alarm to trigger local protection
switching. PE2 then sends CRC-AIS packets to PE1 and uses the APS protocol to
complete protection switching through negotiation with PE1.
3. After receiving the CRC-AIS packets, PE1 reports a CRC-AIS alarm.
Background
If bit errors occur on an interface, deploy bit-error-triggered section switching to trigger an
upper-layer application associated with the interface for a service switchover.
Implementation Principles
Trigger-section bit error detection must be enabled on an interface. After detecting bit errors
on an inbound interface, a device notifies the interface management module of the bit errors.
The link-layer protocol status of the interface then changes to bit-error-detection Down,
triggering an upper-layer application associated with the interface for a service switchover.
After the bit errors are cleared, the link-layer protocol status of the interface changes to Up,
triggering an upper-layer application associated with the interface for a service switchback.
The device also notifies the BFD module of the bit error status, and then uses BFD packets to
advertise the bit error status to the peer device.
If bit-error-triggered section switching also has been deployed on the peer device, the bit
error status is advertised to the interface management module of the peer device. The
link-layer protocol status of the interface then changes to bit-error-detection Down or Up,
triggering an upper-layer application associated with the interface for a service
switchover or switchback.
If bit-error-triggered section switching is not deployed on the peer device, the peer
device cannot detect the bit error status of the interface's link. In this case, the peer
device can only depend on an upper-layer application (for example, IGP) for link fault
detection.
For example, on the network shown in Figure 1-279, trigger-section bit error detection is
enabled on each interface, and nodes communicate through IS-IS routes. In normal cases,
IS-IS routes on PE1 and PE2 are preferentially transmitted over the primary link. Therefore,
traffic in both directions is forwarded over the primary link. If PE2 detects bit errors on the
interface to PE1:
The link-layer protocol status of the interface changes to bit-error-detection Down,
triggering IS-IS routes to be switched to the secondary link. Traffic from PE2 to PE1 is
then forwarded over the secondary link. PE2 uses a BFD packet to notify PE1 of the bit
errors.
After receiving the BFD packet, PE1 sets the link-layer protocol status of the
corresponding interface to bit-error-detection Down, triggering IS-IS routes to be
switched to the secondary link. Traffic from PE1 to PE2 is then forwarded over the
secondary link.
If trigger-section bit error detection is not supported or enabled on PE1's interface to PE2,
PE1 can only use IS-IS to detect that the primary link is unavailable, and then performs an
IS-IS route switchover.
Usage Scenario
If LDP LSPs are used, deploy bit-error-triggered section switching to cope with link bit errors
on the LDP LSPs.
After bit-error-triggered section switching is deployed, if bit errors occur on both the primary and
secondary links on an LDP LSP, the interface status changes to bit-error-detection Down on both the
primary and secondary links. As a result, services are interrupted. Therefore, it is recommended that you
deploy bit-error-triggered IGP route switching.
Background
Bit-error-triggered section switching can cope with link bit errors. If bit errors occur on both
the primary and secondary links, bit-error-triggered section switching changes the interface
status on both the primary and secondary links to bit-error-detection Down. As a result,
services are interrupted because no link is available. To resolve the preceding issue, deploy
bit-error-triggered IGP route switching. After the deployment is complete, link bit errors
trigger IGP route costs to be adjusted, preventing upper-layer applications from transmitting
service traffic to links with bit errors. Bit-error-triggered IGP route switching ensures normal
running of upper-layer applications and minimizes the impact of bit errors on services.
Implementation Principles
Link-quality bit error detection must be enabled on an interface. After detecting bit errors on
an inbound interface, a device notifies the interface management module of the bit errors. The
link quality level of the interface then changes to Low, triggering an IGP (OSPF or IS-IS) to
increase the cost of the interface's link. In this case, IGP routes do not preferentially select the
link with bit errors. After the bit errors are cleared, the link quality level of the interface
changes to Good, triggering the IGP to restore the original cost for the interface's link. In this
case, IGP routes preferentially select the link again. The device also notifies the BFD module
of the bit error status, and then uses BFD packets to advertise the bit error status to the peer
device.
If bit-error-triggered IGP route switching also has been deployed on the peer device, the
bit error status is advertised to the interface management module of the peer device. The
link quality level of the interface then changes to Low or Good, triggering the IGP to
increase the cost of the interface's link or restore the original cost for the link. IGP routes
on the peer device then do not preferentially select the link with bit errors or
preferentially select the link again.
If bit-error-triggered IGP route switching is not deployed on the peer device, the peer
device cannot detect the bit error status of the interface's link. Therefore, the IGP does
not adjust the cost of the link. Traffic from the peer device may still pass through the link
with bit errors. As a result, bidirectional IGP routes pass through different links. The
local device can receive traffic properly, and services are not interrupted. However, the
impact of bit errors on services cannot be eliminated.
For example, on the network shown in Figure 1-280, link-quality bit error detection is enabled
on each interface, and nodes communicate through IS-IS routes. In normal cases, IS-IS routes
on PE1 and PE2 are preferentially transmitted over the primary link. Therefore, traffic in both
directions is forwarded over the primary link. If PE2 detects bit errors on interface 1:
PE2 adjusts the link quality level of interface 1 to Low, triggering IS-IS to increase the
cost of the interface's link to a value (for example, 40). PE2 uses a BFD packet to
advertise the bit errors to PE1.
After receiving the BFD packet, PE1 also adjusts the link quality level of interface 1 to
Low, triggering IS-IS to increase the cost of the interface's link to a value (for example,
40).
IS-IS routes on both PE1 and PE2 preferentially select the secondary link, because the cost
(20) of the secondary link is less than the cost (40) of the primary link. Traffic in both
directions is then switched to the secondary link.
If bit-error-triggered IGP route switching is not supported or enabled on PE1, PE1 cannot
detect the bit errors. In this case, PE1 still sends traffic to PE2 through the primary link. PE2
can receive traffic properly, but services are affected by the bit errors.
If PE2 detects bit errors on both interface 1 and interface 2, PE2 adjusts the link quality levels
of the interfaces to Low, triggering the costs of the interfaces' links to be increased to 40.
IS-IS routes on PE2 still preferentially select the primary link to ensure service continuity,
because the cost (40) of the primary link is less than the cost (50) of the secondary link. To
eliminate the impact of bit errors on services, you must manually restore the link quality.
Bit-error-triggered section switching and bit-error-triggered IGP route switching are mutually exclusive.
Usage Scenario
If LDP LSPs are used, deploy bit-error-triggered IGP route switching to cope with link bit
errors on the LDP LSPs. Bit-error-triggered IGP route switching ensures service continuity
even if bit errors occur on both the primary and secondary links on an LDP LSP. Therefore, it
is recommended that you deploy bit-error-triggered IGP route switching.
Background
If a trunk interface is used to increase bandwidth, improve reliability, and implement load
balancing, deploy bit-error-triggered trunk update to cope with bit errors detected on trunk
member interfaces.
Implementation Principles
According to the types of protection switching triggered, bit-error-triggered trunk update is
classified as follows:
deployed on the trunk interface. After detecting bit errors on a trunk interface's member
interface, a device advertises the bit errors to the trunk interface, triggering the trunk interface
to delete the member interface from the forwarding plane. The trunk interface then does not
select the member interface to forward traffic. After the bit errors are cleared from the
member interface, the trunk interface re-adds the member interface to the forwarding plane.
The trunk interface can then select the member interface to forward traffic. If bit errors occur
on all trunk member interfaces or the number of member interfaces without bit errors is lower
than the lower threshold for the trunk interface's Up links, the trunk interface ignores the bit
errors on the member interfaces and remains Up. However, the link quality level of the trunk
interface becomes Low, triggering an IGP (OSPF or IS-IS) to increase the cost of the trunk
interface's link. IGP routes then do not preferentially select the link. If the number of member
interfaces without bit errors reaches the lower threshold for the trunk interface's Up links, the
link quality level of the trunk interface changes to Good, triggering the IGP to restore the
original cost for the trunk interface's link. In this case, IGP routes preferentially select the link
again.
The device also notifies the BFD module of the bit error status, and then uses BFD packets to
advertise the bit error status to the peer device connected to the trunk interface.
If trunk-bit-error-triggered IGP route switching also has been deployed on the peer
device, the bit error status is advertised to the trunk interface of the peer device. The
trunk interface is then triggered to delete or re-add the member interface from or to the
forwarding plane. The link quality level of the trunk interface is also triggered to change
to Low or Good. In this case, the cost of IGP routes is adjusted, implementing
switchover or switchback synchronization with the device.
If trunk-bit-error-triggered IGP route switching is not deployed on the peer device, the
peer device cannot detect the bit error status of the interface's link. If the trunk interface
of the device has deleted the member interface with bit errors from the forwarding plane,
the trunk interface of the peer device may still select the member interface to forward
traffic. Similarly, if the link quality level of the trunk interface on the device has changed
to Low, the IGP is triggered to increase the cost of the trunk interface's link. In this case,
IGP routes do not preferentially select the link. However, IGP on the peer device does
not adjust the cost of the link. Traffic from the peer device may still pass through the link
with bit errors. As a result, bidirectional IGP routes pass through different links. To
ensure normal running of services, the device can receive traffic from the member
interface with bit errors. However, bit errors may affect service quality.
Layer 2 trunk interfaces do not support an IGP. Therefore, bit-error-triggered IGP route switching cannot
be deployed on Layer 2 trunk interfaces. If bit errors occur on all Layer 2 trunk member interfaces or the
number of member interfaces without bit errors is lower than the lower threshold for the trunk interface's
Up links, the trunk interface remains in the Up state. As a result, protection switching cannot be
triggered. To eliminate the impact of bit errors on services, you must manually restore the link quality.
Usage Scenario
If a trunk interface is deployed, deploy bit-error-triggered trunk update to cope with bit errors
detected on trunk member interfaces. Trunk-bit-error-triggered IGP route switching is
recommended.
Background
To cope with link bit errors along an RSVP-TE tunnel and reduce the impact of bit errors on
services, deploy bit-error-triggered RSVP-TE tunnel switching. After the deployment is
complete, service traffic is switched from the primary CR-LSP to the backup CR-LSP if bit
errors occur.
Implementation Principles
On the network shown in Figure 1-283, trigger-LSP bit error detection must be enabled on
each node's interfaces on the RSVP-TE tunnels. To implement dual-ended switching,
configure the RSVP-TE tunnels in both directions as bidirectional associated CR-LSPs. If a
node on a CR-LSP detects bit errors in a direction, the ingress of the tunnel obtains the BER
of the CR-LSP after BER calculation and advertisement. For details, see 1.5.9.2.1 Bit Error
Detection.
The ingress then determine the bit error status of the CR-LSP based on the BER threshold
configured for the RSVP-TE tunnel. For rules for determining the bit error status of the
CR-LSP, see Figure 1-284.
If the BER of the CR-LSP is greater than or equal to the switchover threshold of the
RSVP-TE tunnel, the CR-LSP is always in the excessive BER state.
If the BER of the CR-LSP falls below the switchback threshold, the CR-LSP changes to
the normalized BER state.
Figure 1-284 Rules for determining the bit error status of the CR-LSP
After the bit error statuses of the primary and backup CR-LSPs are determined, the RSVP-TE
tunnel determines whether to perform a primary/backup CR-LSP switchover based on the
following rules:
If the primary CR-LSP is in the excessive BER state, the RSVP-TE tunnel attempts to
switch traffic to the backup CR-LSP.
If the primary CR-LSP changes to the normalized BER state or the backup CR-LSP is in
the excessive BER state, traffic is switched back to the primary CR-LSP.
The RSVP-TE tunnel in the opposite direction also performs the same switchover, so that
traffic in the upstream and downstream directions is not transmitted over the CR-LSP with bit
errors.
Usage Scenario
If RSVP-TE tunnels are used as public network tunnels, deploy bit-error-triggered RSVP-TE
tunnel switching to cope with link bit errors along the tunnels.
Background
When PW redundancy is configured for L2VPN services, bit-error-triggered switching can be
configured. With this function, if bit errors occur, services can switch between the primary
and secondary PWs.
Principles
Trigger-LSP bit error detection must be enabled on each node's interfaces. PW redundancy
can be configured in either a single segment or multi-segment scenario.
Single-segment PW redundancy scenario
In Figure 1-285, PE1 establishes a primary PW to PE2 and a secondary PW to PE3,
which implements PW redundancy. If PE2 detects bit errors, the processing is as follows:
− PE2 switches traffic destined for PE1 to the path bypass PW -> PE3 -> secondary
PW -> PE1 and sends a BFD packet to notify PE1 of the bit errors.
− Upon receipt of the BFD packet, PE1 switches traffic destined for PE2 to the path
secondary PW-> PE3 -> bypass PW -> PE2.
Traffic between PE1 and PE2 can travel along bit-error-free links.
After traffic switches to the secondary PW, and bit errors are removed from the primary PW,
traffic switches back to the primary PW based on a configured switchback policy.
If an RSVP-TE tunnel is established for PWs, and bit-error-triggered RSVP-TE tunnel switching is
configured, a switchover is preferentially performed between the primary and hot-standby CR-LSPs in
the RSVP-TE tunnel. A primary/secondary PW switchover can be triggered only if the
primary/hot-standby CR-LSP switchover fails to remove bit errors in either of the following situations:
Usage Scenario
If L2VPN is used to carry user services and PW redundancy is deployed to ensure reliability,
deploy bit-error-triggered switching for PW to minimize the impact of bit errors on user
services and improve service quality.
Background
On an FRR-enabled HVPN, bit-error-triggered switching can be configured for VPN routes.
With this function, if bit errors occur on the HVPN, VPN routes re-converge so that traffic
switches to a bit-error-free link.
Principles
Trigger-LSP bit error detection must be enabled on each node's interfaces. In Figure 1-287, an
HVPN is configured on an IP/MPLS backbone network. VPN FRR is configured on a UPE. If
SPE1 detects bit errors, the processing is as follows:
SPE1 reduces the Local Preference attribute value or increase the Multi-Exit
Discrimination (MED) attribute value. Then, the preference value of a VPN route that
SPE1 advertises to an NPE is reduced. As a result, the NPE selects the VPN route to
SPE2, not the VPN route to SPE1. Traffic switches to the standby link. In addition, SPE1
sends a BFD packet to notify the UPE of bit errors.
Upon receipt of the BFD packet, the UPE switches traffic to the standby link over the
VPN route destined for SPE2.
If the bit errors on the active link are removed, the UPE re-selects the VPN routes destined for
SPE1, and SPE1 restores the preference value of the VPN route to be advertised to the NPE.
Then the NPE also re-selects the VPN route destined for SPE1.
If an RSVP-TE tunnel is established for an L3VPN, and bit-error-triggered RSVP-TE tunnel switching
is configured, a traffic switchover between the primary and hot-standby CR-LSPs in the RSVP-TE
tunnel is preferentially performed. An active/standby L3VPN route switchover can be triggered only if
the primary/hot-standby CR-LSP switchover fails to remove bit errors in either of the following
situations:
Usage Scenario
If L3VPN is used to carry user services and VPN FRR is deployed to ensure reliability,
deploy bit-error-triggered L3VPN switching to minimize the impact of bit errors on user
services and improve service quality.
Background
In PW/E-PW over static CR-LSP scenarios, if primary and secondary PWs are configured,
deploy bit-error-triggered protection switching. If bit errors occur, service traffic is switched
from the primary PW to the secondary PW.
Implementation Principles
The MAC-layer SD alarm function (Trigger-LSP type) must be enabled on interfaces, and
then MPLS-TP OAM must be deployed to monitor CR-LSPs/PWs. Static PWs/E-PWs are
classified as SS-PWs or MS-PWs.
In an SS-PW networking scenario (see Figure 1-288), the bit error generation and clearing
process is as follows:
Bit error generation:
If the BER on an inbound interface of the P node reaches a specified threshold, the CRC
module detects the bit error status of the inbound interface, notifies all static CR-LSP
modules, and constructs and sends AIS packets to PE2.
Upon receipt of the AIS packets, PE2 notifies static PWs established over the CR-LSPs
of the bit errors and instructs the TP OAM module to perform APS. APS triggers a
primary/backup CR-LSP switchover, and a PW established over the new primary
CR-LSP takes over traffic.
Bit error clearing: After bit errors are cleared, the CRC module cannot detect the bit error
status on the inbound interface. The CRC module informs the TP OAM module that the bit
errors have been cleared. Upon receipt of the notification, the TP OAM module stops sending
AIS packets to PE2 functioning as the egress. PE2 does not receive AIS packets after a
specified period and determines that the bit errors have been cleared. PE2 then generates an
AIS clear alarm and instructs the TP OAM to perform APS. APS triggers a primary/backup
CR-LSP switchover, and services are switched back to the PW over the primary CR-LSP.
In an MS-PW networking scenario (see Figure 1-289), the bit error generation and clearing
process is as follows:
Bit error generation:
The CRC module of an inbound interface on the SPE detects bit errors and determines to
send either an SF or SD alarm based on a specified BER threshold. The CRC module
then notifies the TP OAM module of the bit errors. The TP OAM module notifies the bit
error status, sends RDI packets, and performs APS. The APS module instructs the peer
node to perform a traffic switchover, which triggers a primary/backup CR-LSP
switchover. The PW established over the bit-error-free CR-LSP takes over traffic.
If the BER on an inbound interface of the SPE reaches a specified threshold, the CRC
module detects the bit error status of the inbound interface, sets all static CR-LSP
modules to the bit error status, and constructs and sends AIS packets to PE2.
Upon receipt of the AIS packets, PE2 notifies the TP OAM module. The TP OAM
module then performs APS, which triggers a primary/backup CR-LSP switchover. The
PW established over the bit-error-free CR-LSP takes over traffic.
Bit error clearing: After bit errors are cleared, the CRC module cannot detect the bit error
status on the inbound interface. The CRC module informs the TP OAM module that the bit
errors have been cleared. Upon receipt of the notification, the TP OAM module stops sending
AIS packets to PE2 functioning as the egress. PE2 does not receive AIS packets after a
specified period and determines that the bit errors have been cleared. PE2 then generates an
AIS clear alarm and instructs the TP OAM to perform APS. APS triggers a primary/backup
CR-LSP switchover, and services are switched back to the PW over the primary CR-LSP.
If a tunnel protection group has been deployed for static CR-LSPs carrying PWs/E-PWs, bit errors
preferentially trigger static CR-LSP protection switching. Bit-error-triggered PW protection switching is
performed only when bit-error-triggered static CR-LSP protection switching fails to protect services
against bit errors (for example, bit errors occur on both the primary and backup CR-LSPs).
Usage Scenario
If static CR-LSPs/PWs/E-PWs are used to carry user services and MPLS-TP OAM is
deployed to ensure reliability, deploy bit-error-triggered APS to minimize the impact of bit
errors on user services and improve service quality.
1.5.9.3 Applications
1.5.9.3.1 Application of Bit-Error-Triggered Protection Switching in a Scenario in Which
TE Tunnels Carry an IP RAN
Networking Description
Figure 1-290 shows typical L2VPN+L3VPN networking in an IP RAN application. A VPWS
based on an RSVP-TE tunnel is deployed at the access layer, an L3VPN based on an
RSVP-TE tunnel is deployed at the aggregation layer, and L2VPN access to L3VPN is
configured on the AGGs. To ensure reliability, deploy PW redundancy for the VPWS,
configure VPN FRR protection for the L3VPN, and configure hot-standby protection for the
RSVP-TE tunnels.
Feature Deployment
To prevent the impact of bit errors on services, deploy bit-error-triggered RSVP-TE tunnel
switching, bit-error-triggered PW switching, and bit-error-triggered L3VPN route switching in
the scenario shown in Figure 1-290. The deployment process is as follows:
Enable trigger-LSP bit error detection on each interface.
Bit-error-triggered RSVP-TE tunnel switching: Enable bit-error-triggered protection
switching on the RSVP-TE tunnel interfaces of the CSG and AGG1, and configure
thresholds for bit-error-triggered RSVP-TE tunnel switching.
Bit-error-triggered PW switching: Enable bit-error-triggered PW switching on the
interfaces that connect the CSG and AGG1 and the interfaces that connect the CSG and
AGG2.
Bit-error-triggered L3VPN route switching: Configure bit-error-triggered L3VPN route
switching in the VPNv4 view of AGG1.
Scenario 2
On the network shown in Figure 1-292, if bit errors occur on both locations 1 and 2, both the
primary and secondary links of the RSVP-TE tunnel between the CSG and AGG1 detect the
bit errors. In this case, bit-error-triggered RSVP-TE tunnel switching cannot protect services
against bit errors. The bit errors further trigger PW and L3VPN route switching.
After detecting the bit errors, the CSG performs a primary/secondary PW switchover and
switches upstream traffic to AGG2.
After detecting the bit errors, AGG1 reduces the priority of VPNv4 routes advertised to
RSG1, so that RSG1 preferentially selects VPNv4 routes advertised by AGG2.
Downstream traffic is then switched to AGG2.
Networking Description
Figure 1-293 shows typical L2VPN+L3VPN networking in an IP RAN application. A VPWS
based on an LDP LSP is deployed at the access layer, an L3VPN based on an LDP LSP is
deployed at the aggregation layer, and L2VPN access to L3VPN is configured on the AGGs.
To ensure reliability, deploy LDP and IGP synchronization for the LDP LSPs, and configure
Eth-Trunk interfaces on key links.
Feature Deployment
To prevent the impact of bit errors on services, deploy bit-error-triggered IGP route switching
in the scenario shown in Figure 1-293. Deploy trunk-bit-error-triggered IGP route switching
on the Eth-Trunk interfaces. The deployment process is as follows:
Enable link-quality bit error detection on each physical interface and Eth-Trunk member
interface.
Enable bit-error-triggered IGP route switching on each physical interface and Eth-Trunk
interface.
Scenario 2
On the network shown in Figure 1-295, if bit errors occur on location 2 (Eth-Trunk member
interface), AGG1 detects the bit errors.
If the number of member interfaces without bit errors is still higher than the lower
threshold for the Eth-Trunk interface's Up links, the Eth-Trunk interface deletes the
Eth-Trunk member interface from the forwarding plane. In this case, service traffic is
still forwarded over the normal path.
If the number of member interfaces without bit errors is lower than the lower threshold
for the Eth-Trunk interface's Up links, the Eth-Trunk interface ignores the bit errors on
the Eth-Trunk member interface and remains Up. However, the link quality level of the
Eth-Trunk interface becomes Low, triggering an IGP (OSPF or IS-IS) to increase the cost
of the Eth-Trunk interface's link. IGP routes then do not preferentially select the link.
AGG1 also uses a BFD packet to advertise the bit errors to the peer device, so that the
peer device also performs the same processing. Both upstream and downstream traffic is
then switched to the paths without bit errors.
Networking Description
Figure 1-296 shows a typical IP RAN. L2VPN services are carried on static CR-LSPs.
CR-LSP APS is configured to provide tunnel-level protection. Additionally, PW APS/E-PW
APS is configured for L2VPN services to provide service-level protection.
Feature Deployment
To meet high reliability requirements of the IP RAN and protect services against bit errors,
configure bit-error-triggered protection switching for the CR-LSPs/PWs. To do so, enable bit
error detection on the interfaces along the CR-LSPs/PWs, configure the switching type as
trigger-LSP, and configure bit error alarm generation and clearing thresholds. If the BER
reaches the bit error alarm threshold configured on an interface of a device along a static
CR-LSP or PW, the device determines that a bit error occurrence event has occurred and
notifies the MPLS-TP OAM module of the event. The MPLS-TP OAM module uses AIS
packets to advertise the bit error status to the egress, and then APS is used to trigger a traffic
switchover.
Terms
Term Definition
Related Version
The following table lists the product version related to this document.
Intended Audience
This document is intended for:
Network planning engineers
Commissioning engineers
Data configuration engineers
System maintenance engineers
Security Declaration
Encryption algorithm declaration
The encryption algorithms DES/3DES/SKIPJACK/RC2/RSA (RSA-1024 or
lower)/MD2/MD4/MD5 (in digital signature scenarios and password encryption)/SHA1
(in digital signature scenarios) have a low security, which may bring security risks. If
protocols allowed, using more secure encryption algorithms, such as AES/RSA
(RSA-2048 or higher)/SHA2/HMAC-SHA2 is recommended.
Password configuration declaration
− Do not set both the start and end characters of a password to "%^%#". This causes
the password to be displayed directly in the configuration file.
− To further improve device security, periodically change the password.
Personal data declaration
Your purchased products, services, or features may use users' some personal data during
service operation or fault locating. You must define user privacy policies in compliance
with local laws and take proper measures to fully protect personal data.
Feature declaration
− The NetStream feature may be used to analyze the communication information of
terminal customers for network traffic statistics and management purposes. Before
enabling the NetStream feature, ensure that it is performed within the boundaries
permitted by applicable laws and regulations. Effective measures must be taken to
ensure that information is securely protected.
− The mirroring feature may be used to analyze the communication information of
terminal customers for a maintenance purpose. Before enabling the mirroring
function, ensure that it is performed within the boundaries permitted by applicable
laws and regulations. Effective measures must be taken to ensure that information is
securely protected.
− The packet header obtaining feature may be used to collect or store some
communication information about specific customers for transmission fault and
error detection purposes. Huawei cannot offer services to collect or store this
information unilaterally. Before enabling the function, ensure that it is performed
within the boundaries permitted by applicable laws and regulations. Effective
measures must be taken to ensure that information is securely protected.
Reliability design declaration
Network planning and site design must comply with reliability design principles and
provide device- and solution-level protection. Device-level protection includes planning
principles of dual-network and inter-board dual-link to avoid single point or single link
of failure. Solution-level protection refers to a fast convergence mechanism, such as FRR
and VRRP.
Special Declaration
This document serves only as a guide. The content is written based on device
information gathered under lab conditions. The content provided by this document is
intended to be taken as general guidance, and does not cover all scenarios. The content
provided by this document may be different from the information on user device
interfaces due to factors such as version upgrades and differences in device models,
board restrictions, and configuration files. The actual user device information takes
precedence over the content provided by this document. The preceding differences are
beyond the scope of this document.
The maximum values provided in this document are obtained in specific lab
environments (for example, only a certain type of board or protocol is configured on a
tested device). The actually obtained maximum values may be different from the
maximum values provided in this document due to factors such as differences in
hardware configurations and carried services.
Interface numbers used in this document are examples. Use the existing interface
numbers on devices for configuration.
The pictures of hardware in this document are for reference only.
Symbol Conventions
The symbols that may be found in this document are defined as follows.
Symbol Description
Indicates an imminently hazardous situation which, if not
avoided, will result in death or serious injury.
Change History
Updates between document issues are cumulative. Therefore, the latest document issue
contains all updates made in previous issues.
Changes in Issue 03 (2017-09-20)
This issue is the third official release. The software version of this issue is
V800R009C10SPC200.
Changes in Issue 02 (2017-07-30)
This issue is the second official release. The software version of this issue is
V800R009C10SPC100.
Changes in Issue 01 (2017-05-30)
This issue is the first official release. The software version of this issue is
V800R009C10.
1.6.2 Introduction
Definition
An interface is a point of interaction between devices on a network. Interfaces are classified
into physical and logical interfaces.
Physical interfaces physically exist on a device.
Logical interfaces are manually configured interfaces that do not exist physically.
Logical interfaces can be used to exchange data.
Purpose
A physical interface connects a device to another device using a transmission medium (for
example, a cable). The physical interface and transmission medium together form a
transmission channel that transmits data between the devices. Before data reaches a device, it
must pass through the transmission channel. In addition, sufficient bandwidth must be
provided to reduce channel congestion.
A logical interface does not require additional hardware, therefore reducing costs.
Benefits
Data can be transmitted properly over a transmission channel that a physical interface
and a transmission medium form, therefore enabling communication between users.
Data communication can be implemented using logical interfaces, without additional
hardware requirements.
1.6.3 Principles
1.6.3.1 Basic Concepts
Interface Types
The router exchanges data and interacts with other devices on a network through interfaces.
Interfaces are classified into physical and logical interfaces.
Physical Interfaces
Physical interfaces physically exist on boards. They are divided into the following types:
− LAN interfaces: interfaces through which the router can exchange data with the
devices on a LAN.
− WAN interfaces: interfaces through which the router can exchange data with remote
devices on a WAN.
Logical Interfaces
Logical interfaces are manually configured interfaces that do not exist physically.
Logical interfaces can be used to exchange data.
Table 1-81 Command views and prompts of physical interfaces supported by the NE20E
MTU
The maximum transmission unit (MTU) is the size (in bytes) of the longest packet that can be
transmitted on a physical network. The MTU is very important for interworking between two
devices on a network. If the size of a packet exceeds the MTU supported by a transit node or a
receiver, the transit node or receiver may fragment the packet before forwarding it or may
even discard it, increasing the network transmission loads. MTU values must be correctly
negotiated between devices to ensure that packets reach the receiver.
If fragmentation is disallowed, packet loss may occur during data transmission at the IP
layer. To ensure that long packets are not discarded during transmission, configure
forcible fragmentation for long packets.
When an interface with a small MTU receives long packets, the packets have to be
fragmented. Consequently, when the quality of service (QoS) queue becomes full, some
packets may be discarded.
If an interface has a large MTU, packets may be transmitted at a low speed.
Control-Flap
The status of an interface on a device may alternate between Up and Down for various
reasons, including physical signal interference and incorrect link layer configurations. The
changing status causes Multiprotocol Label Switching (MPLS) and routing protocols to flap.
As a result, the device may break down, causing network interruption. Control-flap controls
the frequency of interface status alternations between Up and Down to minimize the impact
on device and network stability.
The following two control modes are available.
control-flap
Concepts of control-flap:
− Penalty value and threshold
An interface is suppressed or freed from suppression based on the penalty value.
Penalty value: This value is calculated based on the status of the interface
using the suppression algorithm. The penalty value increases with the changing
times of the interface status and decreases with the half life.
Suppression threshold (suppress): The interface is suppressed when the penalty
value is greater than the suppression threshold.
Reuse threshold (reuse): The interface is no longer suppressed when the
penalty value is smaller than the reuse threshold.
Ceiling threshold (ceiling): The penalty value no longer increases when the
penalty value reaches the ceiling threshold.
The parameter configuration complies with the following rule: reuse threshold
(reuse) < suppression threshold (suppress) < maximum penalty value (ceiling).
− Half life
When an interface goes Down for the first time, the half life starts. A device
matches against the half life based on the actual interface status. If a specific half
life is reached, the penalty value decreases by half. Once a half life ends, another
half life starts.
Half life when an interface is Up (decay-ok): When the interface is Up, if the
period since the end of the previous half life reaches the current half life, the
penalty value decreases by half.
Half life when an interface is Down (decay-ng): When the interface is Down,
if the period since the end of the previous half life reaches the current half life,
the penalty value decreases by half.
− Maximum suppression time: The maximum suppression time of an interface is 30
minutes. When the period during which an interface is suppressed reaches the
maximum suppression time, the interface is automatically freed from suppression.
− Penalty value: This value is calculated based on the status of the interface using the
suppression algorithm. The core of the suppression algorithm is that the penalty
value increases with the changing times of the interface status and decreases
exponentially.
− Suppression threshold: The interface is suppressed when the penalty value is greater
than the suppression threshold. The suppression threshold must be greater than the
reuse threshold and smaller than the ceiling threshold.
− Reuse threshold: The interface is no longer suppressed when the penalty value is
smaller than the reuse threshold. The reuse threshold must be smaller than the
suppression threshold.
− Ceiling threshold: The penalty value no longer increases when the penalty value
reaches the ceiling threshold. The ceiling threshold must be greater than the
suppression threshold.
You can set the preceding parameters on the NE20E to restrict the frequency at which an
interface can alternate between Up and Down.
Principles of interface flapping control:
In Figure 1-297, the default penalty value of an interface is 0. The penalty value
increases by 400 each time the interface goes Down. When an interface goes Down for
the first time, the half life starts. The system checks whether the specific half life expires
at an interval of 1s. If the specific half life expires, the penalty value decreases by half.
Once a half life ends, another half life starts.
− If the penalty value reaches suppress, the interface is suppressed. When the
interface is suppressed, the outputs of the display interface, display interface brief,
and display ip interface brief commands show that the protocol status of the
interface remains DOWN(dampening suppressed) and does not change with the
physical status.
− If the penalty value falls below reuse, the interface is freed from suppression. When
the interface is freed from suppression, the protocol status of the interface is in
compliance with the actual status and does not remain Down (dampening
suppressed).
− If the penalty value reaches ceiling, the penalty value no longer increases.
damp-interface
Related concepts:
− penalty value: a value calculated by a suppression algorithm based on an interface's
flappings. The suppression algorithm increases the penalty value by a specific value
each time an interface goes Down and decreases the penalty value exponentially
each time the interface goes Up.
− suppress: An interface is suppressed if the interface's penalty value is greater than
the suppress value.
− reuse: An interface is no longer suppressed if the interface's penalty value is less
than the reuse value.
− ceiling: calculated using the formula of reuse x 2 (MaxSuppressTime/HalfLifeTime). ceiling is
the maximum penalty value. An interface's penalty value no longer increases when
it reaches ceiling.
− half-life-period: period that the penalty value takes to decrease to half. A
half-life-period begins to elapse when an interface goes Down for the first time. If
a half-life-period elapses, the penalty value decreases to half, and another
half-life-period begins.
− max-suppress-time: maximum period during which an interface's status is
suppressed. After max-suppress-time elapses, the interface's actual status is
reported to upper layer services.
Figure 1-298 shows the relationship between the preceding parameters. To facilitate
understanding, figures in Figure 1-298 are all multiplied by 1000.
At t1, an interface goes Down, and its penalty value increases by 1000. Then, the
interface goes Up, and its penalty value decreases exponentially based on the half-life
rule. At t2, the interface goes Down again, and its penalty value increases by 1000,
reaching 1600, which has exceeded the suppress value 1500. At this time if the interface
goes Up again, its status is suppressed. As the interface keeps flapping, its penalty value
keeps increasing until it reaches the ceiling value 10000 at tA. As time goes by, the
penalty value decreases and reaches the reuse value 750 at tB. The interface status is
then no longer suppressed.
Loopback, Layer 2 interfaces that are converted from Layer 3 interfaces using the portswitch command
and NULL interfaces do not support setting the maximum transmission unit (MTU) and deploying
control-flap.
monitoring group monitors the status of all binding interfaces. When a specific proportion of
binding interfaces goes Down, the track interface associated with the interface monitoring
group goes Down, which causes traffic to be switched from the master link to the backup link.
When the number of Down binding interfaces falls below a specific threshold, the track
interface goes Up, and traffic is switched back to the master link.
In the example network shown in Figure 1-299, ten binding interfaces are located on the
network side, and two track interfaces are located on the user side. You can set a Down weight
for each binding interface and a Down weight threshold for each track interface. For example,
the Down weight of each binding interface is set to 10, and the Down weight thresholds of
track interfaces A and B are set to 20 and 80, respectively. When the number of Down binding
interfaces in the interface monitoring group increases to 2, the system automatically instructs
track interface A to go Down. When the number of Down binding interfaces in the interface
monitoring group increases to 8, the system automatically instructs track interface B to go
Down. When the number of Down binding interfaces in the interface monitoring group falls
below 8, track interface B automatically goes Up. When the number of Down binding
interfaces in the interface monitoring group falls below 2, track interface A automatically goes
Up.
1.6.4 Applications
1.6.4.1 Sub-interface
In the network shown in Figure 1-300, multiple sub-interfaces are configured on the physical
interface of Device. Like a physical interface, each sub-interface can be configured with one
IP address. The IP address of a sub-interface must be on the same network segment as the IP
address of a remote network, and the IP address of each sub-interface must be on a unique
network segment.
1.6.4.2 Eth-Trunk
In the network shown in Figure 1-301, an Eth-Trunk that bundles two full-duplex 1000 Mbit/s
interfaces is established between Device A and Device B. The maximum bandwidth of the
trunk link is 2000 Mbit/s.
Backup is enabled within the Eth-Trunk. If a link fails, traffic is switched to the other link to
ensure link reliability.
In addition, network congestion can be avoided because traffic between Device A and Device
B is balanced between the two member links.
The application and networking diagram of IP-Trunk are similar to those of Eth-Trunk.
Improving Reliability
IP address unnumbered
When an interface will only use an IP address for a short period, it can borrow an IP
address from another interface to save IP address resources. Usually, the interface is
configured to borrow a loopback interface address to remain stable.
Router ID
Some dynamic routing protocols require that routers have IDs. A router ID uniquely
identifies a router in an autonomous system (AS).
If OSPF and BGP are configured with router IDs, the system needs to select the
maximum IP address as the router ID from the local interface IP addresses. If the IP
address of a physical interface is selected, when the physical interface goes Down, the
system does not reselect a router ID until the selected IP address is deleted.
Because the loopback interface is stable and always Up, it is recommended as the router
ID of a router.
BGP
To prevent BGP sessions from being affected by physical interface faults, you can
configure a loopback interface as the source interface that sends BGP packets.
When a loopback interface is used as the source interface of BGP packets, note the
following:
− The loopback interface address of the BGP peer must be reachable.
− In the case of an EBGP connection, EBGP is allowed to establish neighbor
relationships through indirectly connected interfaces.
MPLS LDP
In MPLS LDP, a loopback interface address is often used as the transmission address to
ensure network stability. This IP address could be a public network address.
Classifying information
SNMP
To ensure the security of servers, a loopback interface address is used as the source IP
address rather than the outbound interface address of SNMP trap messages. In this
manner, packets are filtered to protect the SNMP management system. The system
allows only the packets from the loopback interface address to access the SNMP port.
This facilitates reading and writing trap messages.
NTP
The Network Time Protocol (NTP) synchronizes the time of all devices. NTP specifies a
loopback interface address as the source address of the NTP packets sent from the local
router.
To ensure the security of NTP, NTP specifies a loopback interface address rather than the
outbound interface address as the source address. In this situation, the system allows
only the packets from the loopback interface address to access the NTP port. In this
manner, packets are filtered to protect the NTP system.
Information recording
During the display of network traffic records, a loopback interface address can be
specified as the source IP address of the network traffic to be output.
In this manner, packets are filtered to facilitate network traffic collection. This is because
only the packets from the loopback interface address can access the specified port.
Security
Identifying the source IP address of logs on the user log server helps to locate the source
of the logs rapidly. It is recommended that you configure a loopback address as the
source IP address of log messages.
HWTACACS
Application Scenario
The Null0 interface does not forward packets. All packets sent to this interface are discarded.
The Null0 interface is applied in two situations:
Loop prevention
The Null0 interface is typically used to prevent routing loops. For example, during route
aggregation, a route to the Null0 interface is always created.
In the example network shown in Figure 1-302, Device A provides access services for
multiple remote nodes.
Device A is the gateway of the local network that uses the Class B network segment
address 172.16.1.1/16. Device A connects to three subnets through Device B, Device C,
and Device D respectively.
Figure 1-302 Example for using the Null0 interface to prevent routing loops
Therefore, configuring a static route on Device A whose outgoing interface is the Null0
interface can prevent routing loops.
Traffic filtering
The Null0 interface provides an optional method for filtering traffic. Unnecessary
packets are sent to the Null0 interface to avoid using an Access Control List (ACL).
Both the Null0 interface and ACL can be used to filter traffic as follows.
− Before the ACL can be used, ACL rules must be configured and then applied to an
interface. When a router receives a packet, it searches the ACL.
If the action is permit, the router searches the forwarding table and then
determines whether to forward or discard the packet.
If the action is deny, the router discards the packet.
− The Null0 interface must be specified as the outbound interface of unnecessary
packets. When a router receives a packet, it searches the forwarding table. If the
router finds that the outbound interface of the packet is the Null0 interface, it
discards the packet.
Using a Null0 interface to filter traffic is more efficient and faster. Using the Null0
interface for packet filtering only requires a route, but using the ACL for packet filtering
requires an ACL rule to be configured and then applied to the corresponding interface on
a router.
The Null0 interface can filter only the router-based traffic, whereas the ACL can filter
both the router-based and interface-based traffic.
To resolve this problem, you can configure an interface monitoring group and add multiple
network-side interfaces on the PEs to the interface monitoring group. When a link failure
occurs on the network side and the interface monitoring group detects that the status of a
certain proportion of network-side interfaces changes, the system instructs the user-side
interfaces associated with the interface monitoring group to change their status accordingly
and allows traffic to be switched between the master and backup links. Therefore, the
interface monitoring group can be used to prevent traffic overloads or interruptions.
Related Version
The following table lists the product version related to this document.
Intended Audience
This document is intended for:
Network planning engineers
Commissioning engineers
Data configuration engineers
System maintenance engineers
Security Declaration
Encryption algorithm declaration
The encryption algorithms DES/3DES/SKIPJACK/RC2/RSA (RSA-1024 or
lower)/MD2/MD4/MD5 (in digital signature scenarios and password encryption)/SHA1
(in digital signature scenarios) have a low security, which may bring security risks. If
protocols allowed, using more secure encryption algorithms, such as AES/RSA
(RSA-2048 or higher)/SHA2/HMAC-SHA2 is recommended.
Password configuration declaration
− Do not set both the start and end characters of a password to "%^%#". This causes
the password to be displayed directly in the configuration file.
− To further improve device security, periodically change the password.
Personal data declaration
Your purchased products, services, or features may use users' some personal data during
service operation or fault locating. You must define user privacy policies in compliance
with local laws and take proper measures to fully protect personal data.
Feature declaration
− The NetStream feature may be used to analyze the communication information of
terminal customers for network traffic statistics and management purposes. Before
enabling the NetStream feature, ensure that it is performed within the boundaries
permitted by applicable laws and regulations. Effective measures must be taken to
ensure that information is securely protected.
− The mirroring feature may be used to analyze the communication information of
terminal customers for a maintenance purpose. Before enabling the mirroring
function, ensure that it is performed within the boundaries permitted by applicable
laws and regulations. Effective measures must be taken to ensure that information is
securely protected.
− The packet header obtaining feature may be used to collect or store some
communication information about specific customers for transmission fault and
error detection purposes. Huawei cannot offer services to collect or store this
information unilaterally. Before enabling the function, ensure that it is performed
Special Declaration
This document serves only as a guide. The content is written based on device
information gathered under lab conditions. The content provided by this document is
intended to be taken as general guidance, and does not cover all scenarios. The content
provided by this document may be different from the information on user device
interfaces due to factors such as version upgrades and differences in device models,
board restrictions, and configuration files. The actual user device information takes
precedence over the content provided by this document. The preceding differences are
beyond the scope of this document.
The maximum values provided in this document are obtained in specific lab
environments (for example, only a certain type of board or protocol is configured on a
tested device). The actually obtained maximum values may be different from the
maximum values provided in this document due to factors such as differences in
hardware configurations and carried services.
Interface numbers used in this document are examples. Use the existing interface
numbers on devices for configuration.
The pictures of hardware in this document are for reference only.
Symbol Conventions
The symbols that may be found in this document are defined as follows.
Symbol Description
Indicates an imminently hazardous situation which, if not
avoided, will result in death or serious injury.
Symbol Description
personal injury, equipment damage, and environment
deterioration.
Change History
Updates between document issues are cumulative. Therefore, the latest document issue
contains all updates made in previous issues.
Changes in Issue 03 (2017-09-20)
This issue is the third official release. The software version of this issue is
V800R009C10SPC200.
Changes in Issue 02 (2017-07-30)
This issue is the second official release. The software version of this issue is
V800R009C10SPC100.
Changes in Issue 01 (2017-05-30)
This issue is the first official release. The software version of this issue is
V800R009C10.
1.7.2 Ethernet
1.7.2.1 Introduction
Overview
Ethernet technology originated from an experimental network on which multiple PCs were
connected at 3 Mbit/s. In general, Ethernet refers to a standard connection for 10 Mbit/s
Ethernet networks. The Digital Equipment Corporation (DEC), Intel, and Xerox joined efforts
to develop and then issue Ethernet technology in 1982. The IEEE 802.3 standard is based on
and compatible with the Ethernet standard.
In this document, Ethernet II is used to indicate the Ethernet frames used by Ethernet IP packets, and
IEEE802.3 is used to indicate the Ethernet frames used by IEEE802.3 IP packets.
Purpose
Ethernet and token ring networks are typical local area network (LANs).
Ethernet has become the most important LAN networking technology because it is flexible,
simple, and easy to implement.
Shared Ethernet
Initially, Ethernet networks were shared networks with 10M Ethernet technology.
Ethernet networks were constructed with coaxial cables, and computers and terminals
were connected through intricate connectors. This structure is complex and only suitable
for communications in half-duplex mode because only one line exists.
In 1990, 10BASE-T Ethernet based on twisted pair cables emerged. In this technology,
terminals are connected to a hub through twisted pair cables and communicate through a
shared bus in the hub. The structure is physically a star topology. CSMA/CD is still used
because inside the hub, all terminals are connected to a shared bus.
All the hosts are connected to a coaxial cable in a similar manner. When a large number
of hosts exist, the following problems arise:
− Reliability of the media is low.
− Media access conflicts are severe.
− Packets are not properly broadcast.
− Security is not ensured.
100M Ethernet
100M Ethernet works at a higher rate (10 times the rate of 10M Ethernet) and differs
from 10M Ethernet in the following ways:
− Network type: 10M Ethernet supports only a shared Ethernet, while 100M Ethernet
is a 10M/100M auto-sensing Ethernet and can work in half-duplex or full-duplex
mode.
− Negotiation mechanism: 10M Ethernet uses Normal Link Pulses (NLPs) to detect
the link connection status, while 100M Ethernet uses auto-negotiation between two
link ends.
Gigabit Ethernet (GE) and 10GE
With the advancement of computer technology, applications such as large-scale
distributed databases and high-speed transmission of video images emerged. Those
applications require high bandwidth, and traditional 100M Fast Ethernet (FE) cannot
meet the requirements. GE was introduced to provide higher bandwidth.
GE inherits the data link layer of traditional Ethernet. This protects earlier investments in
traditional Ethernet. The GE and traditional Ethernet have different physical layers,
however, to transmit data at 1000 Mbit/s, the GE uses optical fiber channels.
As computer science develops, the 10GE technology becomes mature and is widely used
on Datacom backbone networks. This technology is also used to connect high-end
database servers.
1.7.2.2 Principles
1.7.2.2.1 Ethernet Physical Layer
In these cabling standards, 10, 100, and 1000 represent the transmission rate (in Mbit/s), and
BASE represents baseband.
10M Ethernet cable standard
Table 1-85 lists the 10M Ethernet cabling standard specifications defined in IEEE 802.3.
The greatest limitation of coaxial cable is that devices on the cable are connected in
series, so a single point of failure (SPOF) may cause a breakdown of the entire network.
As a result, the physical standards of coaxial cables, 10BASE-2 and 10BASE-5, have
fallen into disuse.
100M Ethernet cable standard
100M Ethernet is also called Fast Ethernet (FE). Compared with 10M Ethernet, 100M
Ethernet has a faster transmission rate at the physical layer, but has the same rate at the
data link layer.
Table 1-86 lists the 100M Ethernet cable standard specifications.
10Base-T and 100Base-TX have different transmission rates, but both apply to Category
5 twisted pair cables. 10Base-T transmits data at 10 Mbit/s, while 100Base-TX transmits
data at 100 Mbit/s.
Using Gigabit Ethernet technology, you can upgrade an existing Fast Ethernet network
from 100 Mbit/s to 1000 Mbit/s.
The physical layer of Gigabit Ethernet uses 8B10B coding. In the traditional Ethernet
technology, the data link layer delivers 8-bit data sets to the physical layer. After
processing, the 8 bit data sets are sent to the data link layer for transmission.
This process is different on the Gigabit Ethernet of optical fibers, in which the physical
layer maps the 8-bit data sets to 10-bit data sets before sending them to the data link
layer.
10GE cable standards
IEEE 802.3ae is the 10GE cable standard. For 10GE, the cables are all optical fiber in
full-duplex mode.
The development of 10GE is well under way, and will be widely deployed in future.
CSMA/CD
Concept of CSMA/CD
Ethernet was originally designed to connect stations, such as computers and peripherals,
on a shared physical line. However, the stations can only access the shared line in
half-duplex mode. Therefore, a mechanism of collision detection and avoidance is
required to enable multiple devices to share the same line in way that gives each device
fair access. Carrier Sense Multiple Access with Collision Detection (CSMA/CD) was
therefore introduced.
The concept of CSMA/CD is as follows:
− CS: carrier sense
Before transmitting data, a station checks to see if the line is idle. In this manner,
chances of collision are decreased.
− MA: multiple access
The data sent by a station can be received by other stations.
− CD: collision detection
If two stations transmit electrical signals at the same time, the signals are
superimposed, doubling the normal voltage amplitude. This situation results in
collision.
The stations stop transmitting after sensing the conflict, and then resume
transmission after a random delay time.
Working process of CSMA/CD
CSMA/CD works as follows:
a. A station continuously checks whether the shared line is idle.
If the line is idle, the station sends data.
If the line is in use, the station waits until the line is idle.
b. If two stations send data at the same time, a conflict occurs on the line, and the
signal becomes unstable.
c. After detecting an instability, the station immediately stops sending data.
d. The station sends a series of pulses.
The pulses inform other stations that a conflict has occurred on the line.
After detecting a conflict, the station waits for a random period of time, and then
resumes the data transmission.
Ethernet Auto-Negotiation
Purpose of auto-negotiation
The earlier Ethernet used a 10 Mbit/s half-duplex mode that required CSMA/CD to
ensure access by all stations. The introduction of full-duplex mode and 100M Ethernet
created a need to achieve compatibility between the earlier and newer Ethernet
technologies.
Auto-negotiation technology achieves this compatibility by enabling the device on either
end of a link to choose the operation parameters. By exchanging information, the devices
negotiate parameters including half- or full-duplex mode, transmission speed, and flow
control. After the negotiation, the devices operate in the negotiated mode and rate.
Auto-negotiation is defined in the following standards:
− 100M Ethernet standard: IEEE 802.3u
In IEEE 802.3u, auto-negotiation is defined as an optional function.
− Gigabit Ethernet standard: IEEE 802.3z
In IEEE 802.3z, auto-negotiation is defined as a mandatory function.
Principle of auto-negotiation
The auto-negotiation mechanism applies to twisted pair links only.
When no data is transmitted over a twisted pair link, the link is not idle because the
devices on the link transmit pulse signals at low frequency. Each device can identify
these Fast Link Pulses (FLPs) and use them to transmit small amounts of data to
implement auto-negotiation, as shown in Figure 1-304.
Auto-negotiation priorities of the Ethernet duplex link are listed as follows in descending
order:
− 1000M full-duplex
− 1000M half-duplex
− 100M full-duplex
− 100M half-duplex
− 10M full-duplex
− 10M half-duplex
If auto-negotiation succeeds, the Ethernet card activates the link. Then, data can be
transmitted over it. If auto-negotiation fails, the link is inaccessible.
Auto-negotiation is implemented at the physical layer and does not require any data
packets or have impact on upper-layer protocols.
Auto-negotiation rules for interfaces
Two connected interfaces can communicate with each other only when they are in the
same working mode.
− If both interfaces work in the same non-auto-negotiation mode, the interfaces can
communicate.
− If both interfaces work in auto-negotiation mode, the interfaces can communicate
through negotiation. The negotiated working mode depends on the interface with
lower capability. Specifically, if one interface works in full-duplex mode and the
other interface works in half-duplex mode, the negotiated working mode is
half-duplex. The auto-negotiation function also allows the interfaces to negotiate
the use of the traffic control function.
− If a local interface works in auto-negotiation mode and the remote interface works
in a non-auto-negotiation mode, the negotiated working mode of the local interface
depends on the working mode of the remote interface.
Table 1-88 describes the auto-negotiation rules for interfaces of the same type.
Table 1-88 Auto-negotiation rules for interfaces of the same type (local interface working in
auto-negotiation mode)
Table 1-89 describes the auto-negotiation rules for interfaces of different types.
According to the auto-negotiation rules described in Table 1-88 and Table 1-89, if
an interface works in auto-negotiation mode and the connected interface works in a
non-auto-negotiation mode, packets may be dropped or auto-negotiation may fail. It
is recommended that you configure two connected interfaces to work in the same
mode to ensure that they can communicate properly.
FE and higher-rate optical interfaces only support full-duplex mode.
Auto-negotiation is enabled on GE interfaces for the negotiation of traffic control.
When devices are directly connected using GE optical interfaces, auto-negotiation
is enabled on the optical interfaces to detect unidirectional optical fiber faults. If
one of two optical fibers is faulty, the fault information is synchronized on both
ends through auto-negotiation. As a result, interfaces on both ends go Down. After
the fault is rectified, the interfaces go Up again through auto-negotiation.
Hub
Hub principle
When terminals are connected through twisted pair cables, a convergence device called a
hub is required. Hubs operate at the physical layer. Figure 1-305 shows a hub operation
model.
A hub is configured as a box with multiple interfaces, each of which can connect to a
terminal. Therefore, multiple devices can be connected through a hub to form a star
topology.
Note that although the physical topology is a star, the hub uses bus and CSMA/CD
technologies.
MAC Sub-layer
Functions of the MAC sub-layer
The MAC sub-layer is responsible for the following:
− Accessing physical links
− Identifying stations at the data link layer
The MAC sub-layer reserves a unique MAC address to identify each station.
− Transmitting data over the data link layer. After receiving data from the LLC
sub-layer, the MAC sub-layer adds the MAC address and control information to the
data, and then transfers the data to the physical link. During this process, the MAC
sub-layer provides other functions, such as the check function.
Accessing physical links
The MAC sub-layer is associated with the physical layer so that different MAC
sub-layers provide access to different physical layers.
Ethernet has two types of MAC sub-layers:
− Half-duplex MAC: provides access to the physical layer in half-duplex mode.
− Full-duplex MAC: provides access to the physical layer in full-duplex mode.
The two types of MAC are integrated in a network interface card. After the network
interface card is initialized, auto-negotiation is performed to choose an operation mode,
and then a MAC is chosen according to the operation mode.
Identifying stations at the data link layer
The MAC sub-layer uses a MAC address to uniquely identify a station.
MAC addresses are managed by the Institute of Electrical and Electronics Engineers
(IEEE) and allocated in blocks. An organization, generally a vendor, obtains a unique
address block from the IEEE. The address block is called the Organizationally Unique
Identifier (OUI), and can be used by the organization to allocate addresses to 16,777,216
devices.
A MAC address consists of 48 bits, generally represented in dotted hexadecimal notation.
For example, the 48-bit MAC address 0000000011100000111111001000000000110100
is generally represented as 00e0.fc39.8034.
The first 24 bits stand for the OUI; the last 24 bits are allocated by the vendor. For
example, in 00e0.fc39.8034, 00e0.fc is the OUI allocated by the IEEE to Huawei;
39.8034 is the address number allocated by Huawei.
The second bit of a MAC address indicates whether the address is globally or locally
unique. The Ethernet uses globally unique MAC addresses.
Ethernet uses the following types of MAC addresses:
− Physical MAC address
A physical MAC address is permanently stored in network interface hardware (such
as a network interface card) and is used to uniquely identify a terminal on an
Ethernet.
− Broadcast MAC address
A broadcast MAC address indicates all the terminals on a network.
The 48 bits of a broadcast MAC address are all 1s. In hexadecimal notation, this
address is ffff.ffff.ffff.
− Multicast MAC address
A multicast MAC address indicates a group of terminals on a network.
The eighth bit of a multicast MAC address is 1, such as
000000011011101100111010101110101011111010101000.
Transmitting data at the data link layer
Data transmission at the data link layer is as follows:
a. The upper layer delivers data to the MAC sub-layer.
b. The MAC sub-layer stores the data in a buffer.
c. The MAC sub-layer adds the destination and source MAC addresses to the data,
calculates the length of the data frame, and forms Ethernet frames.
d. The Ethernet frame is sent to the peer according to the destination MAC address.
e. The peer compares the destination MAC address with entries in the MAC address
table.
If there is a matching entry, the frame is accepted.
If there is no matching entry, the frame is discarded.
The preceding describes frame transmission in unicast mode. After an upper-layer
application is added to a multicast group, the data link layer generates a multicast MAC
address according to the application, and then adds the multicast MAC address to the
MAC address table. The MAC sub-layer then receives frames with the multicast MAC
address and transmits the frames to the upper layer.
As shown in Figure 1-309, the format of an IEEE 802.3 frame is similar to that of an
Ethernet_II frame. In an IEEE 802.3 frame, however, the Type field is changed to the
Length field, and the LLC field and Sub-Network Access Protocol (SNAP) field occupy
8 bytes of the Data field.
− Length
The Length field specifies the number of bytes of the Data field.
− LLC
The LLC field consists of three sub-fields: Destination Service Access Point
(DSAP), Source Service Access Point (SSAP), and Control.
− SNAP
The SNAP field consists of the Org Code field and Type field. Three bytes of the
Org Code field are all 0s. The Type field functions the same as that in Ethernet_II
frames.
For descriptions of other fields, see the description of Ethernet_II frames.
Based on the values of DSAP and SSAP, IEEE 802.3 networks use the following types of
frames:
− If DSAP and SSAP are both 0xff, the IEEE 802.3 frame becomes a
NetWare-Ethernet frame bearing NetWare data.
− If DSAP and SSAP are both 0xaa, the IEEE 802.3 frame becomes an
Ethernet_SNAP frame.
Ethernet_SNAP frames can encapsulate the data of multiple protocols. The SNAP
can be considered as an extension of the Ethernet protocol. SNAP allows vendors to
invent their own Ethernet transmission protocols.
The Ethernet_SNAP standard is defined by IEEE 802.1 to help ensure compatibility
between the operations between IEEE 802.3 LANs and Ethernet networks.
− Other values of DSAP and SSAP indicate IEEE 802.3 frames.
Jumbo frames
Jumbo frames are Ethernet frames of greater length complying with vendor standards.
Such frames are dedicated to Gigabit Ethernet.
Jumbo frames carry more than 1518 bytes of payload. Generally, Ethernet frames carry a
maximum payload of 1518 bytes. Therefore, to implement transmission of large-sized
datagrams at the IP layer, datagram fragmentation is required to transmit the data within
an Ethernet frame. A frame header and a framer trailer are added to each frame during
frame transmission. Therefore, to reduce network costs and improve network usage and
transmission rate, Jumbo frames are introduced.
The two Ethernet interfaces that need to communicate must both support jumbo frames
so that NE20Es can merge several standard-sized Ethernet frames into a jumbo frame to
improve transmission efficiency.
LLC Sub-layer
As described, the MAC sub-layer supports IEEE 802.3 frames and Ethernet_II frames. In an
Ethernet_II frame, the Type field identifies the upper layer protocol. Therefore, on a device,
the LLC sub-layer is not needed and only the MAC sub-layer is required.
In an IEEE 802.3 frame, useful features are defined at the LLC sub-layer in addition to the
traditional services of the data link layer. These features are specified by the sub-fields of
DSAP, SSAP, and Control.
Networks can support the following types of point-to-point services:
Connection-less service
Currently, the Ethernet implements this service.
Connection-oriented service
The connection is set up before data is transmitted. The reliability of the data
transmission is ensured.
Connection-less data transmission with acknowledgement
The connection is not required before data transmission. The acknowledgement
mechanism is adopted to improve reliability.
The following is an example describing the application of SSAP and DSAP with terminals A
and B that use connection-oriented services. Data is transmitted using the following process:
1. A sends a frame to B to request a connection with B.
2. After receiving the frame, if B has enough resources, B returns an acknowledgement
message that contains a Service Access Point (SAP). The SAP identifies the connection
required by A.
3. After receiving the acknowledgement message, A knows that B has set up a local
connection between them. After creating a SAP, A sends a message containing the SAP
to B. The connection is set up.
4. 4. The LLC sub-layer of A encapsulates the data into a frame. The DSAP field is filled
in with the SAP sent by B; the SSAP field is filled in with that created by A. Then the
LLC sub-layer of A transfers the data to its MAC sub-layer.
5. 5. The MAC sub-layer of A adds the MAC address and Length field to the frame, and
then transfers the frame to the data link layer.
6. 6. After the frame is received at the MAC sub-layer of B, the frame is transferred to
the LLC sub-layer. The LLC sub-layer identifies the connection that the frame belongs to
according to the DSAP field.
7. 7. After checking and acknowledging the frame based on the connection type, the LLC
sub-layer of B transfers the frame to the upper layer.
8. After the frame reaches its destination, A sends B a frame instructing B to release the
connection. At this time, the communications end.
1.7.2.3 Applications
1.7.2.3.1 Computer Interconnection
Computer interconnection is the principal object and the major application of Ethernet
technology.
In early Ethernet LANs, computers were connected through coaxial cables to access shared
directories or a file server. All the computers, whether they are servers or hosts, are equal on
this network.
However, because most traffic flows between clients and servers, the early traffic model led to
bottlenecks on servers.
After the introduction of full-duplex Ethernet technology and Ethernet switches, servers can
connect to high-speed interfaces (100 Mbit/s) on Ethernet switches. Clients can use
lower-speed interfaces. This approach reduces traffic bottlenecks. The modern operating
system provides distributed services and database services, and allows servers to
communicate with clients and other servers for data synchronization. 100M FE cannot meet
the bandwidth requirement; therefore, the 1000M Ethernet technology is introduced to meet
the requirements of the modernized technology.
1.7.3 Trunk
1.7.3.1 Introduction
Definition
Trunk is a technology that bundles multiple physical interfaces into a single logical interface.
This logical interface is called a trunk interface, and each bundled physical interface is called
a member interface.
Trunk technology helps increase bandwidth, enhance reliability, and carry out load balancing.
Purpose
Without trunk technology, the transmission rate between two network devices connected by a
100 Mbit/s Ethernet twisted pair cable can only reach 100 Mbit/s. To obtain a higher
transmission rate, you must change the transmission media or upgrade the network to a
Gigabit Ethernet, which is costly to small- and medium- sized enterprises and schools.
Trunk technology provides an economical solution. For example, a trunk interface with three
100 Mbit/s member interfaces working in full-duplex mode can provide a maximum
bandwidth of 300 Mbit/s.
Both Ethernet interfaces and Packet over SONET/SDH (POS) interfaces can be bundled into a
trunk interface. These two types of interfaces, however, cannot be member interfaces of the
same trunk interface. The reasons are as follows:
Ethernet interfaces apply to a broadcast network where packets are sent to all devices on
the network.
POS interfaces apply to a P2P network, because the link layer protocol of POS interfaces
is High-level Data Link Control (HDLC), which is a point-to-point (P2P) protocol.
Benefits
This feature offers the following benefits:
Increased bandwidth
Improved link reliability through traffic load balancing
1.7.3.2 Principles
1.7.3.2.1 Basic Trunk Principles
The member links of a trunk link can be configured with different weights to carry out load
balancing, which helps ensure connection reliability and greater bandwidth.
Users can configure trunk interfaces to support various routing protocols and services.
Figure 1-310 shows a simple Eth-Trunk example in which two routers are directly connected
through three interfaces. These three interfaces are bundled into an Eth-Trunk interface at
both ends of the trunk link. In this way, the bandwidth is increased, and reliability is
improved.
A trunk link can be considered as a point-to-point link. The devices on the end the link can be
both routers or switches, or a router on one end and a switch on the other.
A trunk has the following advantages:
Greater bandwidth
The total bandwidth of a trunk interface equals the sum of the bandwidth of all its
member interfaces. In this manner, the interface bandwidth is multiplied.
Higher reliability
If a member interface fails, traffic on the faulty link is then switched to an available
member link. This ensures higher reliability for the entire trunk link.
Load balancing
Load balancing can be carried out on a trunk interface, which distributes traffic among
its member interfaces and then transmits the traffic through the member links to the same
destination. This prevents network congestion that occurs when all traffic is transmitted
over one link.
MAC address
Each station or server connected to an Ethernet interface of a device has its own MAC address.
The MAC address table on the device records information about the MAC addresses of
connected devices.
When a Layer 3 router is connected to a Layer 2 switch through two Eth-Trunk links for
different services, if both Eth-Trunk interfaces on the router adopt the default system MAC
address, the system MAC address is learned by the switch and alternates between the two
Eth-Trunk interfaces. In this case, a loop probably occurs between the two devices. To prevent
loops, you can change the MAC address of an Eth-Trunk interface by using the mac-address
command. By configuring the source and destination MAC addresses for two Eth-Trunk links,
you can guarantee the normal transmission of service data flows and improve the network
reliability.
After the MAC address of an Eth-Trunk interface is changed, the device sends gratuitous ARP
packets to update the mapping relationship between MAC addresses and ports.
MTU
Generally, the IP layer controls the maximum length of frames that are sent each time. Any
time the IP layer receives an IP packet to be sent, it checks which local interface the packet
needs to be sent to and queries the MTU of the interface. Then, the IP layer compares the
MTU with the packet length to be sent. If the packet length is greater than the MTU, the IP
layer fragments the packet to ensure that the length of each fragment is smaller or equal to the
MTU.
If forcible unfragmentation is configured, certain packets are lost during data transmission at
the IP layer. To ensure jumbo packets are not dropped during transmission, you need to
configure forcible fragmentation.
Generally, it is recommended that you adopt the default MTU value of 1500 bytes. If you
need to change the MTU of an Eth-Trunk interface, you need to change the MTU of the peer
Eth-Trunk interface to ensure that the MTUs of both interfaces are the same. Otherwise,
services may be interrupted.
Basic Concepts
Link aggregation
Link aggregation is a method of bundling several physical interfaces into a logical
interface to increase bandwidth and reliability.
Link aggregation group
A link aggregation group (LAG) or a trunk link is a logical link that aggregates several
physical links.
If all these aggregated links are Ethernet links, the LAG is called an Ethernet link
aggregation group, or an Eth-Trunk for short, and the interface at each end of the
Eth-Trunk link is called an Eth-Trunk interface.
Each interface that is added to the Eth-Trunk interface is called a member interface.
An Eth-Trunk interface can be considered as a single Ethernet interface. The only
difference lies that an Eth-Trunk interface needs to select one or more member Ethernet
interfaces before forwarding data. You can configure features on an Eth-Trunk interface
the same way as on a single Ethernet interface, except for some features that take effect
only on physical Ethernet interfaces.
the active interfaces selected by devices must be consistent at both ends; otherwise, the
LAG cannot be set up. To ensure the consistency of the active interfaces selected at both
ends, you can set a higher priority for one end. Then the other end can select the active
interfaces accordingly.
If neither of the devices at the two ends of an Eth-Trunk link is configured with the
system priority, the devices adopt the default value 32768. In this case, the Actor is
selected according to the system ID. That is, the device with the smaller system ID
becomes the Actor.
Interface LACP priority
An interface LACP priority is set to specify the priority of an interface to be selected as
an active interface. Interfaces with higher priorities are selected as active interfaces.
A smaller interface LACP priority value indicates a higher interface LACP priority.
M:N backup of member interfaces
Link aggregation in static LACP uses LACPDUs to negotiation on active link selection.
This mode is also called M:N mode where M indicates the number of active links and N
indicates the number of backup links. This mode improves link reliability and
implements load balancing among the M active links.
On the network shown in Figure 1-311, M+N links with the same attributes (in the same
LAG) are set up between two devices. When data is transmitted over the aggregation link,
traffic is distributed among the active (M) links. No data is transmitted over the backup
(N) links. Therefore, the actual bandwidth of the aggregation link is the sum of the
bandwidth of the M links, and the maximum bandwidth that can be provided is the sum
of the bandwidth of M + N links.
If one of the M links fails, LACP selects one available backup link from the N links to
replace the faulty link. In this situation, the actual bandwidth of the aggregation link
remains the sum of the bandwidth of M links, but the maximum bandwidth that can be
provided is the sum of the bandwidth of M + N - 1 links.
M:N backup applies to the scenario where bandwidth of M links needs to be provided
and link redundancy is required. If an active link fails, an LACP-enabled device can
automatically select the backup link with the highest priority and add it to the LAG.
If no backup link is available and the number of Up member links is less than the lower
threshold for the number of Up links, the device shuts down the trunk interface.
In 1:1 master/backup mode, an LAG contains only two member interfaces. One interface
is the primary interface and the other is the backup interface. In normal situations, only
the master interface forwards traffic.
In manual mode, you must manually set up an Eth-Trunk and add an interface to the
Eth-Trunk. You must also manually configure member interfaces to be in the active state.
The manual 1:1 master/backup mode is used when the peer device does not support
LACP.
Manual load balancing mode
In this mode, you must manually create an Eth-Trunk interface and add member
interfaces to it. The LACP protocol is not required.
All member interfaces forward data and perform load balancing.
In manual load balancing mode, traffic can be evenly distributed among all member
interfaces. Alternatively, you can set different weights for member interfaces to
implement uneven load balancing. The interfaces set with greater weights transmit more
traffic.
If an active link of the LAG fails, traffic load balancing is implemented among the
remaining active links.
LACP mode
In LACP mode, you also manually create a trunk interface and add member interfaces to
it. Compared with link aggregation in manual load balancing mode, active interfaces in
LACP mode are selected through the transmission of Link Aggregation Control Protocol
Data Units (LACPDUs). This means that when a group of interfaces are added to a trunk
interface, the status of each member interface (active or inactive) depends on the LACP
negotiation.
Table 1-90 shows the similarities and differences between the manual load balancing
mode and LACP mode.
participate in load balancing. This mode applies when a great amount of link bandwidth is
required for two directly connected devices and one of them does not support LACP. As
shown in Figure 1-312, Device A supports LACP, while Device B does not.
In this mode, load balancing is carried out among all member interfaces. The NE20E supports
two types of load balancing:
Per-flow load balancing
Per-packet load balancing
2. Devices at both ends determine the Actor according to the system LACP priority and
system ID.
As shown in Figure 1-315, devices at both ends receive LACPDUs from each other.
When Device B receives LACPDUs from Device A, Device B checks and records
information about Device A and compares their system priorities. If the system priority
of Device A is higher than that of Device B, Device A functions as the Actor and Device
B selects active interfaces according to the interface priority of Device A. In this manner,
devices on both ends select the same active interfaces.
3. Devices at both ends determine active interfaces according to the LACP priorities and
interface IDs of the Actor.
On the network shown in Figure 1-316, after the devices at both ends determine the
Actor, both devices select active interfaces according to the interface priorities on the
Actor.
Then active interfaces are selected, those to be included in the LAG are specified, and
load balancing is implemented among these active links.
1.7.3.2.5 E-Trunk
Definition
Enhanced Trunk (E-Trunk) implements inter-device link aggregation, providing device-level
reliability.
Background
Eth-Trunk implements link reliability between single devices. However, if a device fails,
Eth-Trunk fails to take effect.
To improve network reliability, carriers introduced device redundancy with master and backup
devices. If the master device or primary link fails, the backup device can take over user
services. In this situation, another device must be dual-homed to the master and backup
devices, and inter-device link reliability must be ensured.
E-Trunk was introduced to meet the requirements. E-Trunk aggregates data links of multiple
devices to form a link aggregation group (LAG). If a link or device fails, services are
automatically switched to the other available links or devices in the E-Trunk, improving link
and device-level reliability.
Basic Concepts
The LACP E-Trunk system priority is used for the E-Trunk to which Eth-Trunk interfaces in static
LACP mode are added.
The LACP system priority is used for Eth-Trunk interfaces in static LACP mode.
The LACP E-Trunk system priority and LACP system priority can be changed. If both priorities are
configured, after an Eth-Trunk interface working in static LACP mode is added to an E-Trunk, only
the LACP E-Trunk system priority takes effect for the Eth-Trunk interface.
The LACP E-Trunk system ID is used for the E-Trunk to which Eth-Trunk interfaces in static LACP
mode are added.
The LACP system ID is used for Eth-Trunk interfaces in static LACP mode.
To change the LACP E-Trunk system ID, run the lacp e-trunk system-id command. The LACP
system ID can only be the MAC address of an Ethernet interface on IPU and cannot be changed.
E-Trunk priority
E-Trunk priorities determine the master/backup status of the devices in an aggregation
group. As shown in Figure 1-318, the smaller the E-Trunk priority value, the higher the
E-Trunk priority. PE1 has a higher E-Trunk priority than PE2, and therefore PE1 is the
master device while PE2 is the backup device.
E-Trunk ID
An E-Trunk ID is an integer that uniquely identifies an E-Trunk.
Working mode
The working mode is subject to the working mode of the Eth-Trunk interface added to
the E-Trunk group. The Eth-Trunk interface works in one of the following modes:
Automatic, Forcible master and Forcible backup.
Timeout period
Normally, the master and backup devices in an E-Trunk periodically send Hello
messages to each other. If the backup device does not receive any Hello message within
the timeout period, it becomes the master device.
The timeout period is obtained through the formula: Timeout period = Sending period x
Multiplier.
If the multiplier is 3, the backup device becomes the master device if it does not receive
any Hello message within three consecutive sending periods.
The Eth-Trunk interfaces can work in either static LACP mode or manual load balancing mode. The
Eth-Trunk and E-Trunk configurations on PE1 and PE2 must be the same.
− CE end
Adding Eth-Trunk interfaces in static LACP mode to an E-Trunk: Create an
Eth-Trunk interface in static LACP mode on the CE, and add the CE interfaces
connecting to the PEs to the Eth-Trunk interface. This ensures link reliability.
Adding Eth-Trunk interfaces in manual load balancing mode to an E-Trunk: Create
an Eth-Trunk interface in manual load balancing mode on the CE, and add the CE
interfaces connecting to the PEs to the Eth-Trunk interface. Then, configure
Ethernet operation, administration and maintenance (OAM) on the CE and PEs,
ensuring link reliability.
The E-Trunk group is invisible to the CE.
When you configure IP addresses for Eth-Trunk interfaces connecting the CE and PEs to transmit Layer
3 services, the PE's Eth-Trunk interface configurations must meet the following requirements:
There are few scenarios for configuring IP addresses for Eth-Trunk interfaces, which connect the CE and
PEs to transmit Layer 3 services and which on PEs are added to to an E-Trunk. In most cases, Eth-Trunk
interfaces work as Layer 2 interfaces.
Table 1-91 Master/backup status of an E-Trunk and its member Eth-Trunk interfaces
Status of the Local Working Mode of Status of the Peer Status of the Local
E-Trunk the Local Eth-Trunk Eth-Trunk
Eth-Trunk Interface Interface
Interface
In normal situations:
− If PE1 functions as the master, Eth-Trunk 10 of PE1 functions as the master, and its
link status is Up.
− If PE2 functions as the backup, Eth-Trunk 10 of PE2 functions as the backup, and
its link status is Down.
If the link between the CE and PE1 fails, the following situations occur:
a. PE1 sends an E-Trunk packet containing information about the faulty Eth-Trunk 10
of PE1 to PE2.
b. After receiving the E-Trunk packet, PE2 finds that Eth-Trunk 10 on the peer is
faulty. Then, the status of Eth-Trunk 10 on PE2 becomes master. Through the
LACP negotiation, the status of Eth-Trunk 10 on PE2 becomes Up.
The Eth-Trunk status on PE2 becomes Up, and traffic of the CE is forwarded
through PE2. In this way, traffic destined for the peer CE is protected.
If PE1 is faulty, the following situations occur:
a. If the PEs are configured with BFD, the PE2 detects that the BFD session status
becomes Down, then functions as the master and Eth-Trunk 10 of PE2 functions as
the master.
b. If the PEs are not configured with BFD, PE2 will not receive any E-Trunk packet
from PE1 before its timeout period runs out, after which PE2 will function as the
master and Eth-Trunk 10 of PE2 will function as the master.
Through the LACP negotiation, the status of Eth-Trunk 10 on PE2 becomes Up.
The traffic of the CE is forwarded through PE2. In this way, destined for the peer
CE is protected.
BFD fast detection
A device cannot quickly detect a fault on its peer based on the timeout period of received
packets. In this case, BFD can be configured on the device. The peer end needs to be
configured with an IP address. After a BFD session is established to detect whether the
route to the peer is reachable, the E-Trunk can sense any fault detected by BFD.
Switchback mechanism
The local device is in master state. In such a situation, if the physical status of the
Eth-Trunk interface on the local device goes Down or the local device fails, the peer
device becomes the master and the physical status of the member Eth-Trunk interface
becomes Up.
When the local end recovers, the local end needs to function as the master. Therefore, the
local Eth-Trunk interface enters the LACP negotiation state. After being informed by
LACP that the negotiation ability is Up, the local device starts the switchback delay timer.
After the switchback delay timer times out, the local Eth-Trunk interface becomes the
master. After LACP negotiation, the Eth-Trunk interface becomes Up.
E-Trunk Restrictions
To improve the reliability of CE and PE links, and to ensure that traffic can be automatically
switched between these links, the configurations on both ends of the E-Trunk link must be
consistent. Use the networking in Figure 1-319 as an example.
The Eth-Trunk link directly connecting PE1 to the CE and the Eth-Trunk link directly
connecting PE2 to the CE must be configured with the same working rate, and duplex
mode. This ensures that both Eth-Trunk interfaces have the same key and join the same
E-Trunk group.
Peer IP addresses must be specified for the PEs to ensure Layer 3 connectivity. The
address of the local PE is the peer address of the peer PE, and the address of the peer PE
is the peer address of the local PE. Here, it is recommended that the addresses of the PEs
are configured as loopback interface addresses.
The E-Trunk group must be bound to a BFD session.
The two PEs must be configured with the same security key (if necessary).
1.7.3.3 Applications
1.7.3.3.1 Application of Eth-Trunk
Service Overview
As the volume of services deployed on networks increases, the bandwidth provided by a
single P2P physical link working in full-duplex mode cannot meet the requirements of service
traffic.
To increase bandwidth, existing interface boards can be replaced with interface boards of
higher bandwidth capacity. However, this would waste existing device resources and increase
upgrade expenditure. If more links are used to interconnect devices, each Layer 3 interface
must be configured with an IP address, wasting IP addresses.
To increase bandwidth without replacing the existing interface boards or wasting IP address
resources, bundle physical interfaces into a logical interface using Eth-Trunk to provide
higher bandwidth.
Networking Description
As shown in Figure 1-320, traffic of different services is sent to the core network through the
user-end provider edge (UPE) and provider edge-access aggregation gateway (PE-AGG).
Different services are assigned different priorities. To ensure the bandwidth and reliability of
the link between the UPE and the PE-AGG, a link aggregation group, Eth-Trunk 1, is
established.
Feature Deployment
In Figure 1-320, Eth-Trunk interfaces are created on the UPE and PE-AGG, and the physical
interfaces that directly connect the UPE and PE-AGG are added to the Eth-Trunk interfaces.
Eth-Trunk offers the following benefits:
Improved link bandwidth. The maximum bandwidth of the Eth-Trunk link is three times
that of each physical link.
Improved link reliability. If one physical link fails, traffic is switched to another physical
link of the Eth-Trunk link.
Network congestion prevention. Traffic between the UPE and PE-AGG is load-balanced
on the three physical links of the Eth-Trunk link.
Prompt transmission of high-priority packets, with quality of service (QoS) policies
applied to Eth-Trunk interfaces.
You can select the operation mode for the Eth-Trunk as follows:
If devices at both ends of the Eth-Trunk link support the Link Aggregation Control
Protocol (LACP), Eth-Trunk interfaces in static LACP mode are recommended.
If the device at either end of the Eth-Trunk does not support LACP, Eth-Trunk interfaces
in manual load balancing mode are recommended.
Service Overview
Eth-Trunk implements link reliability between single devices. However, if a device fails,
Eth-Trunk does not take effect.
To improve network reliability, carriers introduced the device redundancy method that
requires master and backup devices. If the master device or primary link fails, the backup
device can take over user services. However, in this situation, the master and backup devices
must be dual-homed by a downstream device, and inter-device link reliability must be
ensured.
In dual-homing networking, Virtual Router Redundancy Protocol (VRRP) can be used to
ensure device-level reliability, and Eth-Trunk can be used to ensure link reliability. In some
cases, however, traffic cannot be switched to the backup device and secondary link
simultaneously if the master device or primary link fails. As a result, traffic is interrupted. To
address this issue, use Enhanced Trunk (E-Trunk) to implement both device- and link-level
reliability.
Networking Description
In Figure 1-321, the customer edge (CE) is dual-homed to the virtual private LAN service
(VPLS) network, and Eth-Trunk is deployed on the CE and provider edges (PEs) to
implement link reliability.
In normal situations, the CE communicates with remote devices on the VPLS network
through PE1. If PE1 or the link between the CE and PE1 fails, the CE cannot communicate
with PE1. To ensure that services are not interrupted, deploy an E-Trunk on PE1 and PE2. If
PE1 or the link between the CE and PE1 fails, traffic is switched to PE2. The CE then
continues to communicate with remote devices on the VPLS network through PE2. If PE1 or
the link between the CE and PE1 recovers, traffic is switched back to PE1. An E-Trunk
provides backup between Eth-Trunk links of the PEs, improving device-level reliability.
Feature Deployment
Use an E-Trunk comprised of Eth-Trunk interfaces in LACP mode as an example. Figure
1-321 shows how the Eth-Trunk and E-Trunk are deployed.
Deploy Eth-Trunk interfaces in LACP mode on the CE and PEs and add the interfaces
that directly connect the CE and PEs to the Eth-Trunk interfaces to implement link
reliability.
Deploy an E-Trunk on the PEs and add the Eth-Trunk interfaces in LACP mode to the
E-Trunk to implement device-level reliability.
Definition
Layer 2 protocol tunneling allows Layer 2 devices to use Layer 2 tunneling technology to
transparently transmit Layer 2 protocol data units (PDUs) across a Layer 2 network. Layer 2
protocol tunneling supports standard protocols, such as Spanning Tree Protocol (STP), Link
Aggregation Control Protocol (LACP), as well as user-defined protocols.
Purpose
Layer 2 protocol tunneling ensures transparent transmission of private Layer 2 PDUs over a
public network. The ingress device replaces the multicast destination MAC address in the
received Layer 2 PDUs with a specified multicast MAC address before transmitting them onto
the public network. The egress device restores the original multicast destination MAC address
and then forwards the Layer 2 PDUs to their destinations.
1.7.4.2 Principles
1.7.4.2.1 Basic Concepts
Background
Layer 2 protocols running between user networks, such as Spanning Tree Protocol (STP) and
Link Aggregation Control Protocol (LACP), must traverse a backbone network to perform
Layer 2 protocol calculation.
On the network shown in Figure 1-322, User Network 1 and User Network 2 both run a Layer
2 protocol, Multiple Spanning Tree Protocol (MSTP). Layer 2 protocol data units (PDUs) on
User Network 1 must traverse a backbone network to reach User Network 2 to build a
spanning tree. Generally, the destination MAC addresses in Layer 2 PDUs of the same Layer
2 protocol are the same. For example, the MSTP PDUs are BPDUs with the destination MAC
address 0180-C200-0000. Therefore, when a Layer 2 PDU reaches an edge device on a
backbone network, the edge device cannot identify whether the PDU comes from a user
network or the backbone network and sends the PDU to the CPU to calculate a spanning tree.
In Figure 1-322, CE1 on User Network 1 builds a spanning tree together with PE1 but not
with CE2 on User Network 2. As a result, the Layer 2 PDUs on User Network 1 cannot
traverse the backbone network to reach User Network 2.
To resolve the preceding problem, use Layer 2 protocol tunneling. The NE20E supports
tunneling for the following Layer 2 protocols:
Cisco Discovery Protocol (CDP)
Ethernet Local Management Interface (E-LMI)
Ethernet in the First Mile OAM (EOAM3AH)
Device link detection protocol (DLDP)
Dynamic Trunking Protocol (DTP)
Ethernet in the First Mile (EFM)
GARP Multicast Registration Protocol (GMRP)
GARP VLAN Registration Protocol (GVRP)
Huawei Group Management Protocol (HGMP)
Link Aggregation Control Protocol (LACP)
BPDU
Bridge protocol data units (BPDUs) are most commonly used by Layer 2 protocols, such as
STP and MSTP. BPDUs are protocol packets multicast between Layer 2 switches. BPDUs of
different protocols have different destination MAC addresses and are encapsulated in
compliance with IEEE 802.3. Figure 1-323 shows the BPDU format.
The specified multicast MAC address cannot be a multicast MAC address used by well-known
protocols.
b. The ingress device then determines whether to add an outer VLAN tag to the Layer
2 PDUs with a specified multicast MAC address based on the configured Layer 2
protocol tunneling type.
When Layer 2 PDUs leave the backbone network,
a. The egress device restores the original multicast destination MAC address in the
Layer 2 PDUs based on the configured mapping between the multicast destination
MAC address and the specified multicast MAC address.
b. The egress device then determines whether to remove the outer VLAN tag from the
Layer 2 PDUs with the original multicast destination MAC address based on the
configured Layer 2 protocol tunneling type.
Layer 2 PDUs can be tunneled across a backbone network if all of the following conditions
are met:
All sites of a user network can receive Layer 2 PDUs from one another.
Layer 2 PDUs of a user network are not processed by the CPUs of backbone network
devices.
Layer 2 PDUs of different user networks must be isolated and not affect each other.
Table 1-92 describes the Layer 2 protocol tunneling types that Huawei devices support.
On the network shown in Figure 1-324, each PE interface connects to one user network, and
each user network belongs to either LAN-A or LAN-B. Layer 2 PDUs from user networks to
PEs on the backbone network do not carry VLAN tags. The PEs, however, must identify
which LAN the Layer 2 PDUs come from. Layer 2 PDUs from a user network in LAN-A
must be sent to the other user networks in LAN-A, but not to the user networks in LAN-B. In
addition, Layer 2 PDUs cannot be processed by PEs. To meet the preceding requirements,
configure interface-based Layer 2 protocol tunneling on backbone network edge devices.
1. The ingress device on the backbone network identifies the protocol type of the received
Layer 2 PDUs and tags them with the default VLAN ID of the interface that has received
them.
2. The ingress device replaces the multicast destination MAC address in the Layer 2 PDUs
with a specified multicast MAC address based on the configured mapping between the
multicast destination MAC address and the specified multicast MAC address.
3. The internal devices on the backbone network forward the Layer 2 PDUs with a
specified multicast MAC address to the egress devices.
4. The egress devices restore the original destination MAC address in the Layer 2 PDUs
based on the configured mapping between the multicast destination MAC address and
the specified multicast address and send the Layer 2 PDUs to the user networks.
If VLAN-based Layer 2 protocol tunneling is used when many user networks connect to a
backbone network, a large number of VLAN IDs of the backbone network are required. This
may result in insufficient VLAN resources. To reduce the consumption of VLAN resources,
configure QinQ on the backbone network to forward Layer 2 PDUs.
For details about QinQ, see 1.7.6 QinQ in NE20E Feature Description - LAN and MAN Access.
On the network shown in Figure 1-326, after QinQ is configured, a PE adds an outer VLAN
ID of 20 to the received Layer 2 PDUs that carry VLAN IDs in the range 100 to 199 and an
outer VLAN ID of 30 to the received Layer 2 PDUs that carry VLAN IDs in the range 200 to
299 before transmitting these Layer 2 PDUs across the backbone network. To tunnel Layer 2
PDUs from the user networks across the backbone network, configure QinQ-based Layer 2
protocol tunneling on PEs' aggregation interfaces.
1. The ingress device on the backbone network adds a different outer VLAN tag (public
VLAN ID) to the received Layer 2 PDUs based on the inner VLAN IDs (user VLAN IDs)
carried in the PDUs.
2. The ingress device replaces the multicast destination MAC address in the Layer 2 PDUs
with a specified multicast MAC address based on the configured mapping between the
multicast destination MAC address and the specified multicast MAC address.
3. The ingress device transmits the Layer 2 PDUs with a specified multicast MAC address
through different Layer 2 tunnels based on the outer VLAN IDs.
4. The internal devices on the backbone network forward the Layer 2 PDUs with a
specified multicast MAC address to the egress devices.
5. The egress devices restore the original destination MAC address in the Layer 2 PDUs
based on the configured mapping between the multicast destination MAC address and
the specified multicast address, remove the outer VLAN tags, and send the Layer 2
PDUs to the user networks based on the inner VLAN IDs.
On the network shown in Figure 1-327, PE1, PE2, and PE3 constitute a backbone network.
LAN-A and LAN-C belong to VLAN 3; LAN-B and LAN-D belong to VLAN 2. All LANs
send tagged Layer 2 PDUs. CE1 can forward Layer 2 PDUs carrying VLAN 2 and VLAN 3.
CE2 can forward Layer 2 PDUs carrying VLAN 3. CE3 can forward Layer 2 PDUs carrying
VLAN 2. CE1, CE2, and CE3 also run an untagged Layer 2 protocol, such as LLDP.
PEs therefore receive both tagged and untagged Layer 2 PDUs. To transparently transmit both
tagged and untagged Layer 2 PDUs, configure hybrid VLAN-based Layer 2 protocol
tunneling on backbone network edge devices.
1.7.4.3 Applications
1.7.4.3.1 Untagged Layer 2 Protocol Tunneling Application
When each edge device interface on a backbone network connects to only one user network
and Layer 2 protocol data units (PDUs) from the user networks do not carry VLAN tags,
configure untagged Layer 2 protocol tunneling to allow the Layer 2 PDUs from the user
networks to be tunneled across the backbone network. Layer 2 PDUs from the user networks
then travel through different Layer 2 tunnels to reach the destinations to perform Layer 2
protocol calculation.
In Figure 1-328, PEs on the backbone network edge must tunnel Layer 2 PDUs from the user
networks across the backbone network.
PE1, PE2, and PE3 constitute a backbone network and use different interfaces to connect to
LAN-A and LAN-B. Layer 2 PDUs from user networks to PEs on the backbone network do
not carry VLAN tags. The PEs, however, must identify which LAN the Layer 2 PDUs come
from. Layer 2 PDUs from a user network in LAN-A must be sent to the other user networks in
LAN-A, but not to the user networks in LAN-B. In addition, Layer 2 PDUs cannot be
processed by PEs. To meet the preceding requirements, configure interface-based Layer 2
protocol tunneling on backbone network edge devices. Multiple Spanning Tree Protocol
(MSTP) runs on the LANs.
To tunnel Layer 2 PDUs from the user network across the backbone network, configure
untagged Layer 2 protocol tunneling on user-side interfaces on PE1, PE2, and PE3.
The Layer 2 protocol tunneling process is as follows:
1. PE1 identifies the protocol type of the Layer 2 PDUs and tags the Layer 2 PDUs with the
default VLAN ID of the interface that has received the Layer 2 PDUs.
2. PE1 replaces the multicast destination MAC address in the Layer 2 PDUs with a
specified multicast MAC address based on the configured mapping between the
multicast destination MAC address and the specified multicast MAC address.
3. The internal devices on the backbone network forward the Layer 2 PDUs with a
specified multicast MAC address to the egress devices.
4. The egress devices PE2 and PE3 restore the original destination MAC address in the
Layer 2 PDUs based on the configured mapping between the multicast destination MAC
address and the specified multicast address and send the Layer 2 PDUs to the user
networks. The Layer 2 PDUs are transparently transmitted.
user networks to be tunneled across the backbone network. Layer 2 PDUs from the user
networks then travel through different Layer 2 tunnels to reach the destinations to perform
Layer 2 protocol calculation.
In Figure 1-329, PEs on the backbone network edge must tunnel tagged Layer 2 PDUs from
VLAN 100 and VLAN 200 across the backbone network.
In most circumstances, PEs serve as aggregation devices on a backbone network. PE1, PE2,
and PE3 constitute a backbone network, the aggregation interfaces on PE1 and PE2 receive
Layer 2 PDUs from both LAN-A and LAN-B. To differentiate the Layer 2 PDUs from the two
LANs, the PEs must identify tagged Layer 2 PDUs from CEs, with Layer 2 PDUs from
LAN-A carrying VLAN 200 and those from LAN-B carrying VLAN 100
To tunnel Layer 2 PDUs from the user network across the backbone network, configure
VLAN-based Layer 2 protocol tunneling on user-side interfaces on PE1, PE2, and PE3.
The Layer 2 protocol tunneling process is as follows:
1. CE1 sends Layer 2 PDUs with specified VLAN Tag to the backbone network.
2. Configure Layer 2 forwasrding on the aggregation device PE1 to allow BPDUs that
carry specific VLAN Tags to pass through.
3. PE1 receives Layer 2 PDUs from the user networks and identifies that the Layer 2 PDUs
carry a single VLAN tag. PE1 then replaces the multicast destination MAC address in
the Layer 2 PDUs with a specified multicast MAC address and sends the PDUs onto the
backbone network.
4. The internal devices on the backbone network forward the Layer 2 PDUs with a
specified multicast MAC address to the egress devices.
5. The egress devices PE2 and PE3 restore the original destination MAC address in the
Layer 2 PDUs based on the configured mapping between the multicast destination MAC
address and the specified multicast address and send the Layer 2 PDUs to the user
networks. The Layer 2 PDUs are transparently transmitted.
PE1 and PE2 constitute a backbone network and use only VLAN 20 and VLAN 30 for Layer
2 forwarding. CEs send Layer 2 PDUs carrying VLAN 100 and VLAN 200 to the PEs. After
QinQ is configured, a PE adds an outer VLAN ID of 20 to the received Layer 2 PDUs
carrying VLAN 100 and an outer VLAN ID of 30 to the received Layer 2 PDUs carrying
VLAN 200 before transmitting these Layer 2 PDUs across the backbone network. To tunnel
Layer 2 PDUs from the user networks across the backbone network, configure QinQ-based
Layer 2 protocol tunneling on PEs' aggregation interfaces.
PE1, PE2, and PE3 constitute a backbone network. LAN-A and LAN-C belong to VLAN 3;
LAN-B and LAN-D belong to VLAN 2. All LANs send tagged Layer 2 PDUs. CE1 can
forward Layer 2 PDUs carrying VLAN 2 and VLAN 3. CE2 can forward Layer 2 PDUs
carrying VLAN 3. CE3 can forward Layer 2 PDUs carrying VLAN 2. CE1, CE2, and CE3
also run an untagged Layer 2 protocol, such as LLDP.
To tunnel both tagged and untagged Layer 2 PDUs from a large number of VLAN users
across the backbone network, configure hybrid tagged and hybrid untagged attributes and
enable both interface-based and VLAN-based Layer 2 protocol tunneling on the user-side
interfaces of PE1, PE2, and PE3.
The Layer 2 protocol tunneling process is as follows:
1. PE1 receives tagged and untagged Layer 2 PDUs and adds the default VLAN ID of the
interface that has received the untagged Layer 2 PDUs to these PDUs.
2. PE1 replaces the multicast destination MAC address in the Layer 2 PDUs with a
specified multicast MAC address based on the configured mapping between the
multicast destination MAC address and the specified multicast MAC address.
3. The internal devices on the backbone network forward the Layer 2 PDUs with a
specified multicast MAC address to the egress devices.
4. The egress devices PE2 and PE3 restore the original destination MAC address in the
Layer 2 PDUs based on the configured mapping between the multicast destination MAC
address and the specified multicast address. They also remove the outer VLAN tags and
send the Layer 2 PDUs to the user networks.
1.7.5 VLAN
1.7.5.1 Introduction
Definition
The Virtual Local Area Network (VLAN) technology logically divides a physical LAN into
multiple VLANs that are broadcast domains. Each VLAN contains a group of PCs that have
the same requirements. A VLAN has the same attributes as a LAN. PCs of a VLAN can be
placed on different LAN segments. Hosts can communicate within the same VLAN, while
cannot communicate in different VLANs. If two PCs are located on one LAN segment but
belong to different VLANs, they do not broadcast packets to each other. In this manner,
network security is enhanced.
Purpose
The traditional LAN technology based on the bus structure has the following defects:
Conflicts are inevitable if multiple nodes send messages simultaneously.
Messages are broadcast to all nodes.
Networks have security risks as all the hosts in a LAN share the same transmission
channel.
The network constructs a collision domain. More computers on the network cause more
conflicts and lower network efficiency. The network is also a broadcast domain. When many
computers on the network send data, broadcast traffic consumes much bandwidth.
Traditional networks face collision domain and broadcast domain issues, and cannot ensure
information security.
To offset the defects, bridges and Layer 2 switches are introduced to consummate the
traditional LAN.
Bridges and Layer 2 switches can forward data from the inbound interface to outbound
interface in switching mode. This properly solves the access conflict problem on the shared
media, and limits the collision domain to the port level. Nevertheless, the bridge or Layer 2
switch networking can only solve the problem of the collision domain, but not the problems
of broadcast domain and network security.
In this document, the Layer 2 switch is referred to as the switch for short.
To reduce the broadcast traffic, you need to enable the broadcast only among hosts that need
to communicate with each other, and isolate the hosts that do not need the broadcast. A router
can select routes based on IP addresses and effectively suppress broadcast traffic between two
connected network segments. The router solution, however, is costly. Therefore, multiple
logical LANs, namely, VLANs are developed on the physical LAN.
In this manner, a physical LAN is divided into multiple broadcast domains, that is, multiple
VLANs. The intra-VLAN communication is not restricted, while the inter-VLAN
communication is restricted. As a result, network security is enhanced.
For example, if different companies in the same building build their LANs separately, it is
costly; if these companies share the same LAN in the building, there may be security
problems.
Benefits
The VLAN technology offers the following benefits:
Saves network bandwidth resources by isolating broadcast domains.
Improves communication security and facilitates service deployment.
1.7.5.2 Principles
1.7.5.2.1 Basic Concepts
Each frame sent by an 802.1q-capable switch carries a VLAN ID. On a VLAN, Ethernet
frames are classified into the following types:
− Tagged frames: frames with 4-byte 802.1q tags.
− Untagged frames: frames without 4-byte 802.1q tags.
Link Types
VLAN links can be divided into the following types:
Access link: a link connecting a host and a switch. Generally, a PC does not know which
VLAN it belongs to, and PC hardware cannot distinguish frames with VLAN tags.
Therefore, PCs send and receive only untagged frames. In Figure 1-334, links between
PCs and the switches are access links.
Trunk link: a link connecting switches. Data of different VLANs are transmitted along a
trunk link. The two ends of a trunk link must be able to distinguish frames with VLAN
tags. Therefore, only tagged frames are transmitted along trunk links. In Figure 1-334,
links between switches are trunk links. Frames transmitted over trunk links carry VLAN
tags.
Port Types
Some ports of a device can identify VLAN frames defined by IEEE 802.1Q, whereas others
cannot. Ports can be divided into four types based on whether they can identify VLAN
frames:
Access port
An access port connects a switch to a host over an access port, as shown in Figure 1-334.
An access port has the following features:
− Allows only frames tagged with the port default VLAN ID (PVID) to pass.
− Adds a PVID to its received untagged frame.
QinQ port
An 802.1Q-in-802.1Q (QinQ) port refers to a QinQ-enabled port. A QinQ port adds an
outer tag to a single-tagged frame. In this manner, the number of VLANs can meet the
requirement of networks.
Figure 1-336 shows the format of a QinQ frame. The outer tag is a public network tag for
carrying a public network VLAN ID. The inner tag is a private network tag for carrying a
private network VLAN ID.
VLAN Classification
VLANs are classified based on port numbers. In this mode, VLANs are classified based on
the numbers of ports on a switching device. The network administrator configures a port
default VLAN ID (PVID) for each port on the switch. When a data frame reaches a port
which is configured with a PVID, the frame is marked with the PVID if the data frame carries
no VLAN tag. If the data frame carries a VLAN tag, the switching device will not add a
VLAN tag to the data frame even if the port is configured with a PVID. Different types of
ports process VLAN frames in different manners.
Basic Principles
To improve frame processing efficiency, frames arriving at a switch must carry a VLAN tag
for uniform processing. If an untagged frame enters a switch port which has a PVID
configured, the port then add a VLAN tag whose VID is the same as the PVID to the frame. If
a tagged frame enters a switch port that has a PVID configured, the port does not add any tag
to the frame.
The switch processes frames in a different way according to the port types. The following
table describes how a port processes a frame.
Hybrid If only the port If only the If only the A hybrid port
port default vlan port default port default can be added to
command is run vlan vlan multiple
on a hybrid port, command is command is VLANs to send
the hybrid port run on a run on a and receive
receives the frame hybrid port: hybrid port frames for these
and adds the − The and the VLANs. A
default VLAN tag hybrid frame's hybrid port can
to the frame. port VLAN ID is connect a
If only the port accepts the same as switch to a PC
trunk allow-pass the frame the default or connect a
command is run if the VLAN ID, network device
on a hybrid port, frame's the hybrid to another
the hybrid port VLAN port removes network device.
discards the ID is the the VLAN
frame. same as tag and
the forwards the
If both the port
default frame;
default vlan and otherwise, the
port trunk VLAN
ID of the hybrid port
allow-pass discards the
commands are port.
frame.
run on a hybrid − The
If only the
port, the hybrid hybrid
port receives the port port trunk
frame and adds discards allow-pass
the VLAN tag the frame command is
with the default if the run on a
VLAN ID frame's hybrid port:
specified in the VLAN − The
port default vlan ID is hybrid
command to the different port
frame. from the forwards
default the frame
VLAN if the
ID of the frame's
port. VLAN ID
If only the is in the
port trunk permitted
allow-pass range of
On the network shown in Figure 1-337, the trunk link between Device A and Device B must
support both the intra-VLAN 2 communication and the intra-VLAN 3 communication.
Therefore, the ports at both ends of the trunk link must be configured to be bound to VLAN 2
and VLAN 3. That is, Port 2 on Device A and Port 1 on Device B must belong to both VLAN
2 and VLAN 3.
Host A sends a frame to Host B in the following process:
1. The frame is first sent to Port 4 on A.
2. A tag is added to the frame on Port 4. The VID field of the tag is set to 2, that is, the ID
of the VLAN to which Port 4 belongs.
3. Device A checks whether its MAC address table contains the MAC address destined for
Host B.
− If so, Device A sends the frame to the outbound interface Port 2.
− If not, Device A sends the frame to all interfaces bound to VLAN 2 except for Port
4.
4. Upon receipt of the frame, Port 2 sends the frame to Device B.
5. After receiving the frame, Device B checks whether its MAC address table contains the
MAC address destined for Host B.
− If so, Device B sends the frame to the outbound interface Port 3.
− If not, Device B sends the frame to all interfaces bound to VLAN 2 except for Port
1.
6. Upon receipt of the frame, Port 3 sends the frame to Host B.
The intra-VLAN 3 communication is similar, and is omitted here.
Layer 3 switching combines both routing and switching techniques to implement routing
on a switch, improving the overall performance of the network. After sending the first
data flow based on a routing table, a Layer 3 switch generates a mapping table, in which
the mapping between the MAC address and the IP address about this data flow is
recorded. If the switch needs to send the same data flow again, it directly sends the data
flow at Layer 2 but not Layer 3 based on the mapping table. In this manner, delays on the
network caused by route selection are eliminated, and data forwarding efficiency is
improved.
To allow the first data flow to be correctly forwarded based on the routing table, the
routing table must contain correct routing entries. Therefore, configuring a Layer 3
interface and a routing protocol on the Layer 3 switch is required. VLANIF interfaces are
therefore introduced.
A VLANIF interface is a Layer 3 logical interface, which can be configured on either a
Layer 3 switch or a router.
As shown in Figure 1-339, VLAN 2 and VLAN 3 are configured on the switch. You can
then create two VLANIF interfaces on the switch and assign IP addresses to and
configure routes for them. In this manner, VLAN 2 can communicate with VLAN 3.
The Layer 3 switching offsets the defects in the scheme of Layer 2 switch + Router, and
can implement faster traffic forwarding at a lower cost. Nevertheless, the Layer 3
switching has the following defects:
− The Layer 3 switching is applicable only to a network whose interfaces are almost
all Ethernet interfaces.
− The Layer 3 switching is applicable only to a network with stable routes and few
changes in the network topology.
A PC does not need to know the VLAN to which it belongs. It sends only untagged frames.
After receiving an untagged frame from a PC, a switching device determines the VLAN to which the
frame belongs. The determination is based on the configured VLAN division method such as port
information, and then the switching device processes the frame accordingly.
If the frame needs to be forwarded to another switching device, the frame must be transparently
transmitted along a trunk link. Frames transmitted along trunk links must carry VLAN tags to allow
other switching devices to properly forward the frame based on the VLAN information.
Before sending the frame to the destination PC, the switching device connected to the destination PC
removes the VLAN tag from the frame to ensure that the PC receives an untagged frame.
Generally, only tagged frames are transmitted on trunk links; only untagged frames are transmitted on
access links. In this manner, switching devices on the network can properly process VLAN information
and PCs are not concerned about VLAN information.
Background
A VLAN is widely used on switching networks because of its flexible control of broadcast
domains and convenient deployment. On a Layer 3 switch, the interconnection between the
broadcast domains is implemented by using one VLAN with a logical Layer 3 interface.
However, this wastes IP addresses.
Following is an example that shows how IP addresses are wasted.
On the network show in Table 1-94, VLAN 2 requires 10 host addresses. A subnet address
1.1.1.0/28 with a mask length of 28 bits is assigned to VLAN 2. 1.1.1.0 is the subnet number,
and 1.1.1.15 is the directed broadcast address. These two addresses cannot serve as the host
address. In addition, 1.1.1.1, as the default address of the network gateway of the subnet,
cannot be used as the host address. The remaining 13 addresses ranging from 1.1.1.2 to
1.1.1.14 can be used by the hosts. In this way, although VLAN 2 needs only ten addresses, 13
addresses are assigned to it according to the subnet division.
VLAN 3 requires five host addresses. A subnet address 1.1.1.16/29 with a mask length of 29
bits is assigned to VLAN 3. VLAN 4 requires only one address. A subnet address 1.1.1.24/30
with a mask length of 30 bits is assigned to VLAN 4.
2 1.1.1.0/28 1.1.1.1 14 13 10
3 1.1.1.16/29 1.1.1.17 6 5 5
4 1.1.1.24/30 1.1.1.25 2 1 1
The preceding VLANs require a total of 16 (10 + 5 + 1) addresses. However, at least 28 (16 +
8 4) addresses are occupied by the common VLANs. In this way, nearly half of the addresses
are wasted. In addition, if only three hosts, not 10 hosts are bound to VLAN 2 later, the extra
addresses cannot be used by other VLANs and thereby are wasted.
Meanwhile, this division is inconvenient for later network upgrades and expansions. For
example, if you want to add two more hosts to VLAN 4 and do not want to change the IP
addresses assigned to VLAN 4, and the addresses after 1.1.1.24 has been assigned to others, a
new subnet with the mask length of 29 bits and a new VLAN must be assigned to the new
hosts. VLAN 4 has only three hosts, but the three hosts are assigned to two subnets, and a new
VLAN is required. This is inconvenient for network management.
In above, many IP addresses are used as the addresses of subnets, directional broadcast
addresses of subnets, and default addresses of network gateways of subnets and therefore
cannot be used as the host addresses in VLANs. This reduces addressing flexibility and
wastes many addresses. To solve this problem, VLAN aggregation is used.
Principles
The VLAN aggregation technology, also known as the super VLAN, provides a mechanism
that partitions the broadcast domain by using multiple VLANs in a physical network so that
different VLANs can belong to the same subnet. In VLAN aggregation, two concepts are
involved, namely, super VLAN and sub VLAN.
Super VLAN: In a super VLAN that is different from a common VLAN, only Layer 3
interfaces are created, and physical ports are not contained. The super VLAN can be
viewed as a logical Layer 3 concept. It is a collection of many sub VLANs.
Sub VLAN: It is used to isolate broadcast domains. In a sub VLAN, only physical ports
are contained, and Layer 3 VLAN interfaces cannot be created. A sub VLAN implements
Layer 3 switching through the Layer 3 interface of the super VLAN.
A super VLAN can contain one or more sub VLANs that identify different broadcast domains.
The sub VLAN does not occupy an independent subnet segment. In the same super VLAN, IP
addresses of hosts belong to the subnet segment of the super VLAN, regardless of the
mapping between hosts and sub VLANs.
Therefore, the same Layer 3 interface is shared by sub VLANs. Some subnet IDs, default
gateway addresses of the subnet, and directed broadcast addresses of the subnet are saved;
meanwhile, different broadcast domains can use the addresses in the same subnet segment. As
a result, subnet differences are eliminated, addressing becomes flexible and idle addresses are
reduced.
For example, on the network shown in Table 1-94, VLAN 2 requires 10 host addresses,
VLAN 3 requires 5 host addresses, and VLAN 4 requires 1 host address.
To implement VLAN aggregation, create VLAN 10 and configure VLAN 10 as a super
VLAN. Then assign a subnet address 1.1.1.0/24 with the mask length of 24 to VLAN 10;
1.1.1.0 is the subnet number, and 1.1.1.1 is the gateway address of the subnet, as shown in
Figure 1-341. Address assignment of sub VLANs (VLAN 2, VLAN 3, and VLAN 4) is shown
in Table 1-95.
Table 1-95 Example for assigning Host addresses in VLAN aggregation mode
In VLAN aggregation implementation, sub VLANs are not divided according to the previous
subnet border. Instead, their addresses are flexibly assigned in the subnet corresponding to the
super VLAN according to the required host number.
As the Table 1-95 shows that VLAN 2, VLAN 3, and VLAN 4 share a subnet (1.1.1.0/24), a
default gateway address of the subnet (1.1.1.1), and a directed broadcast address of the subnet
(1.1.1.255). In this manner, the subnet ID (1.1.1.16, 1.1.1.24), the default gateway of the
subnet (1.1.1.17, 1.1.1.25), and the directed broadcast address of the subnet (1.1.1.5, 1.1.1.23,
and 1.1.1.24) can be used as IP addresses of hosts.
Totally, 16 addresses (10 + 5 + 1 = 16) are required for the three VLANs. In practice, in this
subnet, a total of 16 addresses are assigned to the three VLANs (1.1.1.2 to 1.1.1.17). A total of
19 IP addresses are used, that is, the 16 host addresses together with the subnet ID (1.1.1.0),
the default gateway of the subnet (1.1.1.1), and the directed broadcast address of the subnet
(1.1.1.255). In the network segment, 236 addresses (255 - 19 = 236) are available, which can
be used by any host in the sub VLAN.
Inter-VLAN Communication
Introduction
VLAN aggregation ensures that different VLANs use the IP addresses in the same subnet
segment. This, however, leads to the problem of Layer 3 forwarding between sub
VLANs.
In common VLAN mode, the hosts of different VLANs can communicate with each
other based on the Layer 3 forwarding through their respective gateways. In VLAN
aggregation mode, the hosts in a super VLAN uses the IP addresses on the same network
segment and share the same gateway address. The hosts in different sub VLANs belong
to the same subnet. Therefore, they communicate with each other based on the Layer 2
forwarding, rather than the Layer 3 forwarding through a gateway. In practice, hosts in
different sub VLANs are isolated in Layer 2. As a result, sub VLANs fails to
communicate with each other.
To solve the preceding problem, you can use proxy ARP.
For details of proxy ARP, see the chapter "ARP" in the NE20E Feature Description - IP Services.
Layer 3 communication between different sub VLANs
As shown in Figure 1-342, super VLAN VLAN 10 contains sub VLAN 2 and sub VLAN
3.
Figure 1-342 Layer 3 communication between different sub VLANs based on ARP proxy
In the scenario where Host A has no ARP entry of Host B in its ARP table and the
gateway (L3 Switch) has proxy ARP enabled, Host A in VLAN 2 wants to
communication with Host B in VLAN 3. The communication process is as follows:
a. After comparing the IP address of Host B 1.1.1.3 with its IP address, Host A finds
that both IP addresses are on the same network segment 1.1.1.0/24 and its ARP
table has no ARP entry of Host B.
b. Host A broadcasts an ARP request to ask for the MAC address of Host B.
c. Host B is not in the broadcast domain of VLAN 2, and cannot receive the ARP
request.
d. The proxy-ARP enabled gateway between the sub VLANs receives the ARP
request from Host A and finds that the IP address of Host B 1.1.1.3 is the IP address
of a directly connected interface. Then the gateway broadcasts an ARP request to
all the other sub VLAN interfaces to ask for the MAC address of Host B.
e. After receiving the ARP request, Host B sends an ARP response.
f. After receiving the ARP response from Host B, the gateway replies with its MAC
address to Host A.
g. Both the gateway and Host A have the ARP entry of Host B.
h. Host A sends packets to the gateway, and then the gateway sends the packets from
Host A to Host B at the Layer 3. In this way, Host A and Host B can communicate
with each other.
The process that Host B sends packets to Host A is similar, and is not mentioned.
Layer 2 communication between a sub VLAN and an external network
As shown in Figure 1-343, in the Layer 2 VLAN communications based on ports, the
received or sent frames are not tagged with the super VLAN ID.
Figure 1-343 Layer 2 communication between a sub VLAN and an external network
Host A sends a frame to Switch 1 through Port 1. Upon receipt, Switch 1 adds a VLAN
tag with a VLAN ID 2 to the frame. The VLAN ID 2 is not changed to the VLAN 10 on
Switch 1 even if VLAN 2 is the sub VLAN of VLAN 10. When the frame is sent by a
trunk Port 3, it still carries the ID of VLAN 2.
That is to say, Switch 1 itself does not send the frames from VLAN 10. If Switch 1
receives frames from VLAN 10, it discards these frames as there is no physical port for
VLAN 10.
A super VLAN has no physical port. This limitation is obligatory, as shown below:
− If you configure the super VLAN and then the trunk interface, the frames of a super
VLAN are filtered automatically according to the allowed VLAN range set on the
trunk interface.
On the network shown in Figure 1-343, no frame of super VLAN 10 passes through
Port 3 on Switch 1, even though the interface allows frames from all VLANs to
pass through.
− If you configure the trunk interface and allow all VLAN packets to pass through,
you still cannot configure the super VLAN on Switch 1, because any VLAN with
physical ports cannot be configured as the super VLAN.
As for Switch 1, the valid VLANs are just VLAN 2 and VLAN 3, and all frames from
these VLANs are forwarded.
Layer 3 communication between a sub VLAN and an external network
Figure 1-344 Layer 3 communication between a sub VLAN and an external network
As shown in Figure 1-344, Switch 1 is configured with super VLAN 4, sub VLAN 2, sub
VLAN 3, and a common VLAN 10. Switch 2 is configured with two common VLANs,
namely, VLAN 10 and VLAN 20. Suppose that Switch 1 is configured with the route to
the network segment 1.1.3.0/24, and Switch 2 is configured with the route to the network
segment 1.1.1.0/24. Then Host A in sub VLAN 2 that belongs to the super VLAN 4
needs to communicate with Host C in Switch 2.
a. After comparing the IP address of Host C 1.1.3.2 with its IP address, Host A finds
that two IP addresses are not on the same network segment 1.1.1.0/24.
b. Host A broadcasts an ARP request to ask for the MAC address of the gateway
(Switch 1).
c. After receiving the ARP request, Switch 1 finds the ARP request packet is from sub
VLAN 2 and replies with an ARP response to Host A through sub VLAN 2. The
source MAC address in the ARP response packet is the MAC address of VLANIF 4
for super VLAN 4.
d. Host A learns the MAC address of the gateway.
e. Host A sends the packet to the gateway, with the destination MAC address as the
MAC address of VLANIF 4 for super VLAN 4, and the destination IP address as
1.1.3.2.
f. After receiving the packet, Switch 1 performs the Layer 3 forwarding and sends the
packet to Switch 2, with the next hop address as 1.1.2.2, the outgoing interface as
VLANIF 10.
g. After receiving the packet, Switch 2 performs the Layer 3 forwarding and sends the
packet to Host C through the directly connected interface VLANIF 20.
h. The response packet from Host C reaches Switch 1 after the Layer 3 forwarding on
Switch 2.
i. After receiving the packet, Switch 1 performs the Layer 3 forwarding and sends the
packet to Host A through the super VLAN.
If devices in two VLANs need to communicate using VLAN mapping, the IP addresses of
these devices must be on the same network segment. Otherwise, devices in the two VLANs
must communicate through routes, and VLAN mapping does not take effect.
The NE20E supports only 1 to 1 VLAN mapping. When a VLAN mapping-enabled interface
receives a single-tagged frame, the interface replaces the VLAN ID in the frame with a
specified VLAN ID.
If a user runs a command to enable a VLAN to go Down, VLAN damping does not need to be
configured.
Background
On an ME network, users and services are differentiated based on a single VLAN tag or
double VLAN tags carried in packets and then access different Virtual Private Networks
(VPNs) through sub-interfaces. In some special scenarios where the access device does not
support QinQ or a single VLAN tag is used in different services, different services cannot be
distributed to different Virtual Switching Instances (VSIs) or VPN instances.
As shown in Figure 1-346, the High Speed Internet (HSI), Voice over IP (VoIP), and Internet
Protocol Television (IPTV) services belong to VLAN 10 and are converged to the UPE
through Switch; the UPE is connected to the SR and BRAS through Layer 2 virtual private
networks (L2VPNs).
If the UPE does not support QinQ, it cannot differentiate the received HSI, VoIP, and IPTV
services for transmitting them over different Pseudo Wires (PWs). In this case, you can
configure the UPE to resolve the 802.1p priorities, DiffServ Code Point (DSCP) values, or
EthType values of packets. Then, the UPE can transmit different packets over different PWs
based on the 802.1p priorities, DSCP values, or EthType values of the packets.
In a similar manner, if the UPE is connected to the SR and BRAS through L3VPNs, the UPE
can transmit different services through different VPN instances based on the 802.1p priorities
or DSCP values of the packets.
Basic Concepts
As shown in Figure 1-346, sub-interfaces of different types are configured at the attachment
circuit (AC) side of the UPE to transmit packets with different 802.1p priorities, DSCP values,
or EthTypes through different PWs or VPN instances. This implements flexible service access.
Flexible service access through sub-interfaces is a technology that differentiates L2VPN
access based on the VLAN IDs and 802.1p priorities/DSCP values/EthType values in packets.
The sub-interfaces are classified in Table 1-96 based on service identification policies
configured on them.
As shown in Figure 1-347, the 802.1p priority is represented by a 3-bit PRI (priority)
field in a VLAN frame defined in IEEE 802.1Q. The value ranges from 0 to 7. The
greater the value, the higher the priority. When the switching device is congested, the
switching device preferentially sends packets with higher priorities. In flexible service
access, this field is used to identify service types so that different services can access
different L2VPNs/L3VPNs.
The EthType is represented by a 2-bit LEN/ETYPE field, as shown in Figure 1-347. In
flexible service access, this field is used to identify service types based on EthType
values (PPPoE or IPoE) so that different services can access different L2VPNs.
DSCP
As shown in Figure 1-348, the DSCP is represented by the first 6 bits of the Type of
Service (ToS) field in an IPv4 packet header, as defined in relevant standards. The DSCP
guarantees QoS on IP networks. Traffic control on the gateway depends on the DSCP
field.
In flexible service access, this field is used to identify service types so that different
services can access different L2VPNs/L3VPNs.
Huawei high-end routers can function as PEs. In this scenario, only the configurations of PEs are
mentioned. For detailed configurations of other devices, see the related configuration guides.
You can configure the 802.1p priorities on the CSG through commands.
For details on L2VPNs, see the chapters "VPWS", "VPWS", and "VPLS" in the NE20E Feature
Description - VPN.
Huawei high-end routers can function as PEs. In this scenario, only the configurations of PEs are
mentioned. For detailed configurations of other devices, see the related configuration guides.
You can configure the DSCP values on the CSG through commands.
For details on L2VPNs, see the chapters "VPWS" and "VPLS" in the NE20E Feature Description -
VPN.
Huawei high-end routers can function as PEs. In this scenario, only the configurations of PEs are
mentioned. For detailed configurations of other devices, see the related configuration guides.
You can configure the DSCP values on the CSG through commands.
For details on L2VPNs, see the chapter "BGP/MPLS IP VPN" in the NE20E Feature Description -
VPN.
Huawei high-end routers can function as PEs. In this scenario, only the configurations of PEs are
mentioned. For detailed configurations of other devices, see the related configuration guides.
You can configure the 802.1p priorities on the CSG through commands.
For details on L2VPNs, see the chapter "BGP/MPLS IP VPN" in the NE20E Feature Description -
VPN.
1.7.5.3 Applications
1.7.5.3.1 Port-based VLAN Classification
On the network shown in Figure 1-354, different companies residing in the same business
premise need to isolate service data. According to the port requirement of each company,
ports of each company are bound to a VLAN. This ensures that each company can have a
"virtual switch" or a "virtual workstation".
The Layer 3 device shown in Figure 1-356 can be a router or a Layer 3 switch.
Multiple VLANs belong to different Layer 3 devices.
On the network shown in Figure 1-357, VLAN 2, VLAN 3, and VLAN 4 are VLANs
across different switches. In such a situation, you can configure a VLANIF interface on
Device A and Device B for each VLAN, and then configure the static route or a routing
protocol on Device A and Device B, so that Device A and Device B can communicate
over a Layer 3 route.
The Layer 3 device shown in Figure 1-357 can be a router or a Layer 3 switch.
After proxy ARP is configured on the router, the sub VLANs in each super VLAN can
communicate with each other.
Terms
None
1.7.6 QinQ
1.7.6.1 Introduction
Definition
802.1Q-in-802.1Q (QinQ) is a technology that adds another layer of IEEE 802.1Q tag to the
802.1Q tagged packets entering the network. This technology expands the VLAN space by
Purpose
During intercommunication between Layer 2 LANs based on the traditional IEEE 802.1Q
protocol, when two user networks access each other through a carrier network, the carrier
must assign VLAN IDs to users of different VLANs, as shown in Figure 1-359. User
Network1 and User Network2 access the backbone network through PE1 and PE2 of a carrier
network respectively.
Figure 1-359 Intercommunication between Layer 2 LANs using the traditional IEEE 802.1Q
protocol
To connect VLAN 100 - VLAN 200 on User Network1 to VLAN 100 - VLAN 200 on User
Network2, interfaces connecting CE1, PE1, the P, PE2, and CE2 can be configured to function
as trunk interfaces and to allow packets from VLAN 100 - VLAN 200 to pass through.
This configuration, however, makes user VLANs visible on the backbone network and wastes
the carrier's VLAN ID resources (4094 VLAN IDs are used). In addition, the carrier has to
manage user VLAN IDs, and users do not have the right to plan their own VLANs.
The 12-bit VLAN tag defined in IEEE 802.1Q identifies only a maximum of 4096 VLANs,
unable to isolate and identify massive users in the growing metro Ethernet (ME) network.
QinQ is therefore developed to expand the VLAN space by adding another 802.1Q tag to an
802.1Q tagged packet. In this way, the number of VLANs increases to 4096 x 4096.
In addition to expanding VLAN space, QinQ is applied in other scenarios with the
development of the ME network and carriers' requirements on refined operation. The outer
and inner VLAN tags can be used to differentiate users from services. For example, the inner
tag represents a user, while the outer tag represents a service. Moreover, QinQ functions as a
simple and practical VPN technology by transparently transmitting private VLAN services
over a public network. It extends services of a core MPLS VPN to the ME network and
implements an end-to-end VPN.
Since the QinQ technology is easy to use, it has been widely applied in the ISP network. For
example, it is used by multiple services in the metro Ethernet. The introduction to selective
QinQ makes QinQ more popular among ISPs. As the metro Ethernet develops, different
vendors propose their own metro Ethernet solutions. QinQ with its simplicity and flexibility,
plays important roles in metro Ethernet solutions.
Benefits
QinQ offers the following benefits:
Extends VLANs to isolate and identify more users.
Facilitates service deployment by allowing the inner and outer tags to represent different
information. For example, use the inner tag to identify a user and the outer tag to identify
a service.
Allows ISPs to implement refined operation by providing diversified encapsulation and
termination modes.
QinQ packets carry two VLAN tags when they are transmitted across a carrier network. The
meanings of the two tags are described as follows:
Inner VLAN tag: private VLAN tag that identifies the VLAN to which a user belongs.
Outer VLAN tag: public VLAN tag that is assigned by a carrier to a user.
QinQ Encapsulation
QinQ encapsulation is to add another 802.1Q tag to a single-tagged packet. QinQ
encapsulation is usually performed on UPE interfaces connecting to users.
Dot1q and QinQ VLAN tag termination sub-interfaces do not support transparent transmission of
packets that do not contain a VLAN tag, and discard received packets that do not contain a VLAN tag.
Applications of VLAN tag termination
− Inter-VLAN communication
The VLAN technology is widely used because it allows Layer 2 packets of different
users to be transmitted separately. With the VLAN technology, a physical LAN is
divided into multiple logical broadcast domains (VLANs). Hosts in the same
VLAN can communicate with each other at Layer 2, but hosts in different VLANs
cannot. The Layer 3 routing technology is required for communication between
hosts in different VLANs. The following interfaces can be used to implement
inter-VLAN communication:
VLANIF interfaces on Layer 3 switches
To allow branches to communicate within Company 1 or Company 2 but not between the two
companies, configure QinQ tunneling on PE1 and PE2. The configuration roadmap is as
follows:
On PE1, user packets entering Port 1 and Port 3 are encapsulated with an outer VLAN
tag 10, and user packets entering Port 2 are encapsulated with an outer VLAN tag 20.
On PE2, user packets entering Port 1 and Port 2 are encapsulated with an outer VLAN
tag 20.
Port 4 on PE1 and Port 3 on PE2 allow the packets tagged with VLAN 20 to pass.
Table 1-97 shows planning of outer VLAN tags of Company 1 and Company 2.
To allow branches to communicate within Company 1 or Company 2 but not between the two
companies, configure Layer 2 selective QinQ on PE1 and PE2.
Table 1-98 shows the planning of outer VLAN tags in the packets entering different
interfaces on PE1 and PE2.
Interface 3 on PE1 or PE2 allows the packets tagged with VLAN 20 to pass.
In Figure 1-364, Device A is a non-Huawei device that uses 0x9100 as the EtherType value,
and Device B is a Huawei device which uses 0x8000 as the EtherType value. To implement
interworking between the Huawei and the non-Huawei devices, configure 0x9100 as the
EtherType value in the outer VLAN tag of QinQ packets sent by the Huawei device.
Principles
QinQ mapping maps VLAN tags in user packets to specified tags before the user packets are
transmitted across the public network.
Before sending local VLAN frames, a sub-interface replaces the tags in the local frames
with external VLAN tags.
Before receiving frames from external VLANs, a sub-interface replaces the tags in the
external VLANs with local VLAN tags.
QinQ mapping allows a device to map a user VLAN tag to a carrier VLAN tag, shielding
different user VLAN IDs in packets.
QinQ mapping is deployed on edge devices of a Metro Ethernet. It is applied in but not
limited to the following scenarios:
VLAN IDs deployed at new sites and old sites conflict, but new sites need to
communicate with old sites.
VLAN IDs planned by each site on the public network conflict. These sites do not need
to communicate.
VLAN IDs on both ends of the public network are asymmetric.
Currently, only 1 to 1 QinQ mapping is supported. When a QinQ mapping-enabled
sub-interface receives a single-tagged packet, the sub-interface replaces the VLAN ID in the
frame with a specified VLAN ID.
The sub-interface for Dot1q VLAN tag termination first identifies the outer VLAN tag and
then generates an ARP entry containing the IP address, MAC address, and outer VLAN tag.
For the upstream traffic, the termination sub-interface strips the Ethernet frame header
(including MAC address) and the outer VLAN tag, and searches the routing table to
perform Layer 3 forwarding based on the destination IP address.
For the downstream traffic, the termination sub-interface encapsulates IP packets with
the Ethernet frame header (including MAC address) and outer VLAN tag according to
ARP entries and then sends IP packets to the target user.
The sub-interface for QinQ VLAN tag termination first identifies double VLAN tags and then
generates an ARP entry containing the IP address, MAC address, and double VLAN tags.
For the upstream traffic, the termination sub-interface strips the Ethernet frame header
(including MAC address) and double VLAN tags, and searches the routing table to
perform Layer 3 forwarding based on the destination IP address.
For the downstream traffic, the termination sub-interface encapsulates IP packets with
the Ethernet frame header (including MAC address) and double VLAN tags according to
ARP entries and then sends IP packets to the target user.
When PC1 and PC3 want to communicate with each other, PC1 sends an ARP request to PC3
to obtain PC3's MAC address. However, as PC1 and PC3 are in different VLANs, PC3 fails to
receive the ARP request from PC1.
To solve this problem, configure proxy ARP on the sub-interface for Dot1q VLAN tag
termination. The detailed communication process is as follows:
1. PC1 sends an ARP Request message to request PC3's MAC address.
2. After receiving the ARP Request message, the PE checks the destination IP address of
the message and finds that the destination IP address is not the IP address of its
sub-interface for Dot1q VLAN tag termination. Then, the PE searches its ARP table for
the PC3's ARP entry.
− If the PE finds this ARP entry, the PE checks whether inter-VLAN proxy ARP is
enabled.
If inter-VLAN proxy ARP is enabled, the PE sends the MAC address of its
sub-interface for Dot1q VLAN tag termination to PC1.
If inter-VLAN proxy ARP is not enabled, the PE discards the ARP Request
message.
− If the PE does not find this ARP entry, the PE discards the ARP Request message
sent by PC1 and checks whether inter-VLAN proxy ARP is enabled.
If inter-VLAN proxy ARP is enabled, the PE sends an ARP Request message
to PC3. After the PE receives an ARP Reply message from PC3, an ARP entry
of PC3 is generated in the PE's ARP table.
If inter-VLAN proxy ARP is not enabled, the PE does not perform any
operations.
3. After learning the MAC address of the sub-interface for Dot1q VLAN tag termination,
PC1 sends IP packets to the PE based on this MAC address.
After receiving the IP packets, the PE forwards them to PC3.
Figure 1-369 Proxy ARP on a sub-interface for Dot1q VLAN tag termination
Figure 1-370 Proxy ARP on a sub-interface for QinQ VLAN tag termination
Figure 1-371 DHCP server on a sub-interface for Dot1q VLAN tag termination
On the network shown in Figure 1-371, the user packet received by the DHCP server carries a
single tag. To enable the sub-interface for Dot1q VLAN tag termination on the DHCP server
to assign an IP address to a DHCP client, configure the DHCP server function on the
sub-interface for Dot1q VLAN tag termination.
Figure 1-372 DHCP server on a sub-interface for QinQ VLAN tag termination
On the network shown in Figure 1-372, the switch has selective QinQ configured, and the
user packet received by the DHCP server carries double tags. To enable the sub-interface for
QinQ VLAN tag termination on the DHCP server to assign an IP address to a DHCP client,
configure the DHCP server function on the sub-interface for QinQ VLAN tag termination.
1. When receiving a DHCP request message, the DHCP relay adds user tag information
into the Option 82 field in the message.
2. When receiving a DHCP reply message (ACK message) from the DHCP server, the
DHCP relay analyzes the DHCP reply and generates a binding table.
3. The DHCP relay checks user packets based on the user tag information.
Figure 1-373 DHCP relay on a sub-interface for Dot1q VLAN tag termination
Figure 1-374 DHCP relay on a sub-interface for QinQ VLAN tag termination
On the network shown in Figure 1-375, sub-interfaces for Dot1q VLAN tag termination
specify an outer tag, such as tag 100, to configure a VRRP backup group.
Maintaining the master/backup status of the VRRP backup group
Responding to ARP request messages of users
The PE responds to ARP requests of users regardless of whether their packets contain the
tag specified during the VRRP configuration.
Updating the MAC address entries of the Layer 2 switch
Gratuitous ARP messages are sent periodically to update the MAC entries of the switch
and are copied for all the VLAN tags specified on the sub-interfaces for Dot1q VLAN
tag termination. In this way, the VLANs on the switch can learn virtual MAC addresses.
To improve system performance, the frequency of sending gratuitous ARP messages is
increased only when a master/backup switchover is performed. During stable operation
of VRRP, the frequency of sending gratuitous ARP messages is lowered, and the interval
at which gratuitous ARP packets are sent must be less than the aging time of MAC
entries on the switch.
The preceding working mechanism has the following advantages:
Only one VRRP instance needs to be created for users on the same network segment,
even if they carry different VLAN tags.
VRRP resources are saved.
Hardware resources are saved.
IP addresses are saved.
On the network shown in Figure 1-376, sub-interfaces for QinQ VLAN tag termination
specify double tags, such as an inner tag 100, outer tag 1000 to configure a VRRP backup
group.
Maintaining the master/backup status of the VRRP backup group
Responding to ARP request messages of users
The PE responds to ARP requests of users regardless of whether their packets contain the
tags specified during the VRRP configuration.
Updating the MAC address entries of the Layer 2 switch
Gratuitous ARP messages are sent periodically to update the MAC entries of the switch
and are copied for all the VLAN tags specified on the sub-interfaces for QinQ VLAN tag
termination. In this way, the VLANs on the switch can learn virtual MAC addresses. To
improve system performance, the frequency of sending gratuitous ARP messages is
increased only when a master/backup switchover is performed. During stable operation
of VRRP, the frequency of sending gratuitous ARP messages is lowered, and the interval
at which gratuitous ARP packets are sent must be less than the aging time of MAC
entries on the switch.
The preceding working mechanism has the following advantages:
Only one VRRP instance needs to be created for users on the same network segment,
even if they carry different VLAN tags.
Figure 1-377 L3VPN access through a sub-interface for Dot1q VLAN tag termination
Figure 1-378 L3VPN access through a sub-interface for QinQ VLAN tag termination
Figure 1-379 VPWS access through a sub-interface for QinQ VLAN tag termination
Figure 1-380 VPLS access through a sub-interface for Dot1q VLAN tag termination
VPLS supports the Point-to-Multipoint Protocol (P2MP) and forwards data by learning MAC
addresses. In this case, VPLS access through a sub-interface for Dot1q VLAN tag termination
can be performed by MAC address learning on the basis of a single VLAN tag. Note that
there are no restrictions on VLAN tags for VPLS access.
Figure 1-381 VPLS access through a sub-interface for QinQ VLAN tag termination
VPLS supports the P2MP and forwards data by learning MAC addresses. In this case, VPLS
access through a sub-interface for QinQ VLAN tag termination can be performed by MAC
address learning on the basis of double VLAN tags. Note that there are no restrictions on
VLAN tags for VPLS access.
On the network shown in Figure 1-382, when the DSLAM forwards double-tagged multicast
packets to the UPE, the UPE processes the packets as follows based on double-tag contents:
1. When the double-tagged packets carrying an outer S-VLAN tag and an inner C-VLAN
tag are transmitted to the UPE to access the Virtual Switching Instances (VSIs), the UPE
terminates the double tags and binds the packets to the multicast VSIs through Pseudo
Wires (PWs). Then, the PE-AGG terminates PWs and adds multicast VLAN tags to the
packets. Finally, the packets are transmitted to the multicast source. For example, IPTV
packets with S-VLAN 3 and C-VLANs ranging from 1 to 1000 are terminated on the
UPE and then access a PW. The PE-AGG terminates the PW and adds multicast VLAN
8 to the packets. IGMP snooping sets up forwarding entries based on the interface
number, S-VLAN tag, and C-VLAN tag and supports multicast packets with different
C-VLAN tags. Each PW then forwards the multicast packets based on their S-VLAN IDs
and C-VLAN IDs.
2. When the double-tagged packets carrying an outer C-VLAN tag and an inner S-VLAN
tag are transmitted to the UPE, the UPE enabled with VLAN swapping swaps the outer
C-VLAN tag and inner S-VLAN tag. If multicast packets access Layer 2 VLANs, the
packets are processed in mode 1; if multicast packets access VSIs, the packets are
processed in mode 2.
(VPN) through a main interface. Such a configuration is not flexible because multiple users
cannot access through the same physical interface. To ensure the access of multiple users
through the same physical interface, you can use the QinQ stacking function on different
sub-interfaces. This requires that CE-VLANs on PE1 and PE2 be the same.
On the network shown in Figure 1-383, a QinQ stacking sub-interface on PE1 adds an outer
VLAN tag of the ISP network to its received user packets that carry a VLAN tag ranging from
1 to 200 on sub-interfaces. Then, PE1 sends these packets to the VPWS network.
To solve this problem, the 802.1p value in the inner VLAN tag must be processed on a QinQ
sub-interface. The following three ways are available on a QinQ interface:
Ignores the 802.1p value in the inner VLAN tag, but resets the 802.1p value in the outer
VLAN tag.
Automatically maps the 802.1p value in the inner VLAN tag to an 802.1p value in the
outer VLAN tag.
Sets the 802.1p value in the outer VLAN tag according to the 802.1p value in the inner
VLAN tag.
In Figure 1-386, QinQ supports 802.1p in following modes:
Pipe mode: A specified 802.1p value is set.
Uniform mode: The 802.1p value in the inner VLAN tag is used.
Maps the 802.1p value in the inner VLAN tag to an 802.1p value in the outer VLAN tag.
Multiple 802.1p values in the inner VLAN tag can be mapped to an 802.1p value in the
outer VLAN tag, but one 802.1p value in the inner VLAN tag cannot be mapped to
multiple 802.1p values in the outer VLAN tag.
1.7.6.3 Applications
1.7.6.3.1 User Services on a Metro Ethernet
On the network shown in Figure 1-387, DSLAMs support multiple permanent virtual channel
(PVC) access. A user uses multiple services, such as HSI, IPTV and VoIP.
PVCs are used to carry services that are assigned with different VLAN ID ranges. The
following table lists the VLAN ID ranges for each service.
If a user needs to use the VoIP service, user VoIP packets are sent to a DSLAM over a
specified PVC and assigned with VLAN ID 301. When the packets reach the UPE, an outer
VLAN ID (for example, 2000) is added to the packets. The inner VLAN ID (301) represents
the user, and the outer VLAN ID (2000) represents the VoIP service (the DSLAM location can
also be marked if you add different VLAN tags to packets received by different DSLAMs).
The UPE then sends the VoIP packets to the NPE where the double VLAN tags are terminated.
Then, the NPE sends the packets to an IP core network or a VPN.
HSI and IPTV services are processed in the same way. The difference is that QinQ
termination of HSI services is implemented on the BRAS.
The NPE can generate a Dynamic Host Configuration Protocol (DHCP) binding table to avoid
network attacks. In addition, the NPE can implement DHCP authentication based on the
two-layer tags and has Virtual Router Redundancy Protocol (VRRP) enabled to ensure service
reliable access.
A carrier deploys the VPLS technology on the IP/MPLS core network and QinQ on the ME
network. Three VLANs are assigned for each site to identify the finance, marketing and other
departments, and the VLAN ID for finance is 100, for marketing is 200, and for others is 300.
An outer VLAN 1000 is encapsulated on a UPE (Packets can be added with different VLAN
tags on different UPEs). The sub-interface bound to a VSI on the NPE connected to the UPE
is in symmetry mode. In this way, users belonging to the same VLAN in different sites can
communicate with each other.
Terms
Term Definition
QinQ interface An interface that can process VLAN frames with a single tag (Dot1q
Term Definition
termination) or with double tags (QinQ termination).
Sub-interface An interface that identifies the single or double tags in a packet and
for VLAN tag removes the single or double tags before sending the packets.
termination
1.7.7 EVC
1.7.7.1 Introduction
Definition
An Ethernet virtual connection (EVC) defines a uniformed Layer 2 Ethernet transmission and
configuration model. An EVC is defined by the Metro-Ethernet Forum (MEF) as an
association between two or more user network interfaces within an Internet service provider
(ISP) network. In the EVC model, a bridge domain functions as a local broadcast domain that
can isolate user networks.
An EVC is a model, rather than a specific service or technique.
Purpose
Figure 1-389 shows the traditional service model supported by the NE20E.
The NE20E's traditional service model has limitations, which are described in Table 1-103.
The EVC model addresses the limitations and was introduced shown in Figure 1-390.
Table 1-103 provides a comparison between the traditional service model and the EVC model
of the NE20E.
Table 1-103 Comparison between the traditional service model and the EVC model of the NE20E
Benefits
EVC provides an Ethernet service model and a configuration model. EVC simplifies
configuration management, improves operation and maintenance efficiency, and enhances
service expansibility.
1.7.7.2 Principles
1.7.7.2.1 EVC Service Bearing
Table 1-104 lists EVC types defined by the MEF.
Related Concepts
EVC Layer 2 sub-interface
An EVC Layer 2 sub-interface is connected to a BD and a VPWS network but cannot be
directly connected to a Layer 3 network.
BD
A BD is a broadcast domain. VLAN tags are transparent within a BD, and MAC address
learning is based on BDs.
An EVC Layer 2 sub-interface belongs to only one BD. Each EVC Layer 2 sub-interface
functioning as a service access point is added to a specific bridge domain and transmits a
specific type of service, which implements service isolation.
BDIF
A BDIF interface is a Layer 3 logical interface that terminates Layer 2 services and
provides Layer 3 access.
Each BD has only one BDIF interface.
Figure 1-391 shows a diagram of EVC service bearing, involving EFPs, broadcast domains,
and Layer 3 access.
An EVC Layer 2 sub-interface is used as an EVC service access point, on which traffic
encapsulation types and behaviors can be flexibly combined. A traffic encapsulation type and
behavior are grouped into a traffic policy. Traffic policies help implement flexible Ethernet
traffic access.
Traffic encapsulation
A Layer 2 Ethernet network can transmit untagged, single-tagged, and double-tagged
packets. To enable a specific EVC Layer 2 sub-interface to transmit a specific type of
packet, specify an encapsulation type on the EVC Layer 2 sub-interface. Table 1-105
lists traffic encapsulation types supported by Layer 2 sub-interfaces.
On a physical interface, if only one EVC Layer 2 sub-interface is created and the
encapsulation type is Default, all traffic is forwarded through the EVC Layer 2
sub-interface.
If a physical interface has both a Default EVC sub-interface and EVC sub-interfaces of
other traffic encapsulation types (such as Dot1q and QinQ), and all the non-Default EVC
sub-interfaces are Down, traffic precisely matching these non-Default EVC
sub-interfaces will not be forwarded through the Default EVC sub-interface.
Traffic behaviors
Table 1-106 lists traffic behaviors supported by Layer 2 sub-interfaces.
Traffic policies
A traffic policy is a combination of a traffic encapsulation type and a traffic behavior. In
the following example, a traffic policy is used. On the network shown in Figure 1-397,
users accessing PE1 need to communicate with users on other PEs at Layer 2. The
following steps are performed:
− Create a bridge domain on PE1, create an EVC Layer 2 sub-interface on the PE1
interface that users access, configure an encapsulation type on the EVC Layer 2
sub-interface and add the EVC Layer 2 sub-interface to the bridge domain.
− Create a bridge domain with the same ID as that on PE1 on each of the other PEs,
configure EVC Layer 2 sub-interfaces on PE interfaces that user access, specify
various encapsulation types and behaviors, and add all EVC Layer 2 sub-interfaces
to the bridge domain.
− Create EVC Layer 2 sub-interfaces connecting all PEs except PE1 and add these
sub-interfaces to the same bridge domain.
All user devices must be on the same network segment to help users on PE1 and other PEs successfully
communicate.
PE3 por Def push vid Adds a tag with VLAN ID Removes tag with VLAN
t4 ult 10 10 to each received ID 10 from each received
untagged packet. single-tagged packet.
Traffic encapsulation types and behaviors can be combined flexibly in policies. Table
1-107 describes traffic policies for transmitting traffic.
Quality of service (QoS) policies can be deployed on Layer 2 sub-interfaces to differentiate services and
properly allocate resources for the services.
Traffic forwarding
Figure 1-398 shows traffic forwarding based on an EVC model when Layer 2
sub-interfaces receive packets carrying two VLAN tags.
Layer 2 sub-interfaces are created on the PE1 and PE2 interfaces connecting to the CEs.
A traffic policy is deployed on each EVC Layer 2 sub-interface, and the sub-interfaces
are added to BD1.
− Packet transmission from CE1 to CE2
When receiving double-tagged packets from CE1, the EVC Layer 2 sub-interface of
port 1 on PE1 matches the packets against its traffic encapsulation and receives only
the packets with the outer VLAN ID 100 and inner VLAN ID 10. The EVC Layer 2
sub-interface removes both VLAN tags from the packets based on its traffic
behavior and then forwards the packets to PE2.
Before the EVC Layer 2 sub-interface of port 1 on PE2 forwards the packets to CE2,
the sub-interface adds the outer VLAN ID 200 and inner VLAN ID 20 to the
packets based on its traffic encapsulation and traffic behavior.
− Packet transmission from CE2 to CE1
When receiving double-tagged packets from CE2, the EVC Layer 2 sub-interface of
port 1 on PE2 matches the packets against its traffic encapsulation and receives only
the packets with the outer VLAN ID 200 and inner VLAN ID 20. The EVC Layer 2
sub-interface removes both VLAN tags from the packets based on its traffic
behavior and then forwards the packets to PE1.
Before the EVC Layer 2 sub-interface of port 1 on PE1 forwards the packets to CE1,
the sub-interface adds the outer VLAN ID 100 and inner VLAN ID 10 to the
packets based on its traffic encapsulation and traffic behavior.
Broadcast Domain
EVC has a unified broadcast domain model, as shown in Figure 1-399.
Layer 3 Access
A BDIF interface is created for a BD in the EVC model. A BDIF interface terminates Layer 2
services and provides Layer 3 access. Figure 1-400 shows how a a BDIF interface forward
packets between Layer 2 and Layer 3.
A BD is created on the PE and implements Layer 2 forwarding of packets from the user
network. Layer 2 sub-interfaces are created on the user side and bound to the same BD and
are each configured with a traffic policy.
A BDIF interface, which is a virtual interface that implements Layer 3 packet forwarding is
created based on the BD and assigned an IP address.
When forwarding packets, the BDIF interface matches only the destination MAC address in
each packet.
Layer 2 to Layer 3: When receiving user packets, Layer 2 sub-interfaces process the
packets based on the traffic policies and then forward the packets to the BD. If the
destination MAC address of a user packet is the MAC address of the BDIF interface, the
device removes the Layer 2 header of the packet and performs Layer 3 packet
forwarding based on the routing tables. For all other user packets, the device directly
performs Layer 2 forwarding for them based on the MAC address table.
Layer 3 to Layer 2: When receiving packets, the device searches its routing table for the
outbound BDIF interface and then sends the packets to this interface. The BDIF interface
encapsulates the packets based on the ARP entries. The device then searches its MAC
address table for the outbound interfaces and performs Layer 2 forwarding for the
packets.
1.7.7.3 Applications
1.7.7.3.1 Application of EVC Bearing VPLS Services
Service Overview
As enterprises widen their global reach and establish more branches in different regions,
applications such as instant messaging and teleconferencing are becoming more common.
This imposes high requirements for end-to-end (E2E) Datacom technologies. A network
capable of providing point to multipoint (P2MP) and multipoint to multipoint (MP2MP)
services is paramount to Datacom function implementation. To ensure the security of
enterprise data, secure, reliable, and transparent data channels must be provided for multipoint
transmission.
Generally, enterprises lease virtual switching instances (VSIs) on a carrier network to carry
services between branches.
Networking Description
In Figure 1-401, Branch 1 and Branch 3 belong to one department (the Procurement
department, for example), and Branch 2 and Branch 4 belong to another department (the R&D
department, for example). Services must be isolated between these departments, but each
department can plan their VLANs independently (for example, different service development
teams belong to different VLANs). The enterprise plans to dynamically adjust the departments
but does not want to lease multiple VSIs on the carrier network because of the associated
costs.
Feature Deployment
In the traditional service model supported by the NE20E shown in Figure 1-401, common
sub-interfaces (VLAN type), sub-interfaces for dot1q VLAN tag termination, or
sub-interfaces for QinQ VLAN tag termination are created on the user-side interfaces of the
PEs. These sub-interfaces are bound to different VSIs on the carrier network to isolate
services in different departments. If the enterprise sets up another department, the enterprise
must lease another VSI from the carrier to isolate the departments, increasing costs.
To allow the enterprise to dynamically adjust its departments and reduce costs, the EVC
model can be deployed on the PEs. In the EVC model, multiple BDs are connected to the
same VSI, and the BDs are isolated from each other.
When a packet travels from a BD to a PW, the PE adds the BD ID to the packet as the
outer tag (P-Tag).
When a packet travels from a PW to a BD, the PE searches for the VSI instance based on
the VC label and the BD based on the P-Tag.
The NE20E also supports the exclusive VSI service mode. This mode is similar to a
traditional service mode in which sub-interfaces are bound to different VSIs to connect to the
VPLS network. Figure 1-403 shows a diagram of the exclusive VSI service mode.
In the exclusive VSI service mode, each VSI is connected to only one BD, and the BD
occupies the VSI resource exclusively.
Service Description
As globalization gains momentum, more and more enterprises set up branches in foreign
countries and requirements for office flexibility are increasing. An urgent demand for carriers
is to provide Layer 2 links for enterprises to set up their own enterprise networks, so that
enterprise employees can conveniently visit enterprise intranets outside their offices.
By combining previous access modes with the current IP backbone network, VPWS prevents
duplicate network construction and saves operation costs.
Networking Description
In the traditional service model supported by the NE20E, common sub-interfaces (VLAN
type), Dot1q VLAN tag termination sub-interfaces, or QinQ VLAN tag termination
sub-interfaces are created on the user-side interfaces of PEs. These sub-interfaces are bound to
different VSIs on the carrier network. If Layer 2 devices use different access modes on a
network, service management and configuration are complicated and difficult. To resolve this
issue, configure an EVC to carry Layer 2 services. This implementation facilitates network
planning and management, driving down enterprise costs.
On the VPWS network shown in Figure 1-404, VPN1 services use the EVC VPWS model.
The traffic encapsulation type and behavior are configured on the PE to ensure service
connectivity within the same VPN instance.
Feature Deployment
1. Create a Layer 2 EVC sub-interface on the PE and specify the traffic encapsulation type
and behavior on the Layer 2 sub-interface.
2. Configure VPWS on the EVC Layer 2 sub-interface.
Terms
Terms Definition
EVC Ethernet Virtual Connection. A model for carrying Ethernet services
over a metropolitan area network (MAN). It is defined by the Metro
Ethernet Forum (MEF). An EVC is a model, rather than a specific
service or technique.
BD bridge domain
1.7.8 STP/RSTP/MSTP
1.7.8.1 Introduction
Definition
Generally, redundant links are used on an Ethernet switching network to provide link backup
and enhance network reliability. The use of redundant links, however, may produce loops,
causing broadcast storms and MAC address table instability. As a result, the communication
quality deteriorates, and the communication service may even be interrupted. The Spanning
Tree Protocol (STP) is introduced to resolve this problem.
STP has a narrow sense and broad sense.
STP, in the narrow sense, refers to only the STP protocol defined in IEEE 802.1D.
STP, in the broad sense, refers to the STP protocol defined in IEEE 802.1D, the Rapid
Spanning Tree Protocol (RSTP) defined in IEEE 802.1W, and the Multiple Spanning
Tree Protocol (MSTP) defined in IEEE 802.1S.
Currently, the following spanning tree protocols are supported:
STP
STP, a management protocol at the data link layer, is used to detect and prevent loops on
a Layer 2 network. STP blocks redundant links on a Layer 2 network and trims a
network into a loop-free tree topology.
The STP topology, however, converges at a slow speed. A port cannot be changed to the
Forwarding state until twice the time specified by the Forward Delay timer elapses.
RSTP
RSTP, as an enhancement of STP, converges a network topology at a faster speed.
In both RSTP and STP, all VLANs share one spanning tree. All VLAN packets cannot be
load balanced, and some VLAN packets cannot be forwarded along the spanning tree.
RSTP is backward compatible with STP and can be used together with STP on a
network.
MSTP
MSTP defines a VLAN mapping table in which VLANs are associated with multiple
spanning tree instances (MSTIs). In addition, MSTP divides a switching network into
multiple regions, each of which has multiple independent MSTIs. In this manner, the
entire network is trimmed into a loop-free tree topology, and replication and circular
propagation of packets and broadcast storms are prevented on the network. In addition,
MSTP provides multiple redundant paths to balance VLAN traffic.
MSTP is compatible with STP and RSTP. Table 1-108 shows a comparison between STP,
RSTP, and MSTP.
Purpose
After a spanning tree protocol is configured on an Ethernet switching network, it calculates
the network topology and implements the following functions to remove network loops:
Loop prevention: The potential loops on the network are cut off after redundant links are
blocked.
Link redundancy: When an active path becomes faulty, a redundant link can be activated
to ensure network connectivity.
Benefits
This feature offers the following benefits to carriers:
Compared with dual-homing networking, the ring networking requires fewer fibers and
transmission resources. This reduces resource consumption.
STP prevents broadcast storms. This implements real-time communication and improves
communication reliability.
On the network shown in Figure 1-405, the following situations may occur:
Broadcast storms exhaust network resources.
It is known that loops lead to broadcast storms. In Figure 1-405, STP is not enabled on
the Device A and Device B. If Host A broadcasts a request, the request is received by
port 1 and forwarded by port 2 on both Device A and Device B. Device A's port 2 then
receives the request from Device B's port 2 and forwards the request from Device A's
port 1. Similarly, Device B's port 2 receives the request from Device A's port 2 and
forwards the request from Device B's port 1. As such transmission repeats, resources on
the entire network are exhausted, causing the network unable to work.
Flapping of MAC address tables damages MAC address entries.
In Figure 1-405, even update of MAC address entries upon the receipt of unicast packets
damages the MAC address table.
Assume that no broadcast storm occurs on the network. Host A unicasts a packet to Host
B. If Host B is temporarily removed from the network at this time, the MAC address
entry of Host B on Device A and Device B is deleted. The packet unicast by Host A to
Host B is received by port 1 on Device A. Device A, however, does not have the MAC
address entry of Host B. Therefore, the unicast packet is forwarded to port 2. Then, port
2 on Device B receives the unicast packet from port 2 on Device A and sends it out
through port 1. As such transmission repeats, port 1 and port 2 on Device A and Device
B continuously receive unicast packets from Host A. Therefore, Device A and Device B
update their MAC address entries continuously, causing the MAC address tables to flap.
Basic Design
STP runs at the data link layer. The routers running STP discover loops on the network by
exchanging information with each other and trim the ring topology into a loop-free tree
topology by blocking a certain interface. In this manner, replication and circular propagation
of packets are prevented on the network. In addition, STP prevents the processing
performance of routers from deteriorating.
The routers running STP usually communicate with each other by exchanging configuration
Bridge Protocol Data Units. BPDUs are classified into two types:
Configuration BPDU: used to calculate a spanning tree and maintain the spanning tree
topology.
Topology Change Notification (TCN) BPDU: used to inform associated routers of a
topology change.
Configuration BPDUs contain the following information for routers to calculate the spanning tree.
Root bridge ID: is composed of a root bridge priority and the root bridge's MAC address. Each STP
network has only one root bridge.
Cost of the root path: indicates the cost of the shortest path to the root bridge.
Designated bridge ID: is composed of a bridge priority and a MAC address.
Designated port ID: is composed of a port priority and a port name.
Message Age: specifies the lifetime of a BPDU on the network.
Max Age: specifies the maximum time a BPDU is saved.
Hello Time: specifies the interval at which BPDUs are sent.
Forward Delay: specifies the time interface status transition takes.
The port priority affects the role of a port in a specified spanning tree instance. For details, see 1.7.8.2.4
STP Topology Calculation.
Path cost
The path cost is a port variable and is used to select a link. STP calculates the path cost
to select a robust link and blocks redundant links to trim the network into a loop-free tree
topology.
On an STP-enabled network, the accumulative cost of the path from a certain port to the
root bridge is the sum of the costs of all the segment paths into which the path is
separated by the ports on the transit bridges.
Table 1-109 shows the path costs defined in IEEE 802.1t. Different router manufacturers
use different path cost standards.
The rate of an aggregated link is the sum of the rates of all Up member links in the aggregated group.
Three Elements
There are generally three elements used when a ring topology is to be trimmed into a tree
topology: root bridge, root port, and designated port. Figure 1-406 shows the three elements.
Root bridge
The root bridge is the bridge with the smallest BID. The smallest BID is determined by
exchanging configuration BPDUs.
Root port
The root port is a port that has the fewest path cost to the root bridge. To be specific, the
root port is determined based on the path cost. Among all STP-enabled ports on a
network bridge, the port with the smallest root path cost is the root port. There is only
one root port on an STP-enabled router, but there is no root port on the root bridge.
Designated port
For description of a designated bridge and designated port, see Table 1-110.
As shown in Figure 1-407, AP1 and AP2 reside on Device A; BP1 and BP2 reside on
Device B; CP1 and CP2 reside on Device C.
− Device A sends configuration BPDUs to Device B through AP1. Device A is the
designated bridge of Device B, and AP1 on Device A is the designated port.
− Two routers, Device B and Device C, are connected to the LAN. If Device B is
responsible for forwarding configuration BPDUs to the LAN, Device B is the
designated bridge of the LAN and BP2 on Device B is the designated port.
Figure 1-407 Networking diagram of the designated bridge and designated port
After the root bridge, root port, and designated port are selected successfully, the entire tree
topology is set up. When the topology is stable, only the root port and the designated port
forward traffic. All the other ports are in the Blocking state and receive only STP protocol
packets instead of forwarding user traffic.
After a router on the STP-enabled network receives configuration BPDUs, it compares the
fields shown in Table 1-111 with that of the configuration BPDUs. The four comparison
principles are as follows:
During the STP calculation, the smaller the value, the higher the priority.
Smallest BID: used to select the root bridge. Devices running STP select the smallest
BID as the root BID shown in Table 1-111.
Smallest root path cost: used to select the root port on a non-root bridge. On the root
bridge, the path cost of each port is 0.
Smallest sender BID: used to select the root port when a router running STP selects the
root port between two ports that have the same path cost. The port with a smaller BID is
selected as the root port in STP calculation. Assume that the BID of Device B is smaller
than that of Device C in Figure 1-406. If the path costs in the BPDUs received by port A
and port B on Device D are the same, port B becomes the root port.
Smallest PID: used to block the port with a greater PID but not the port with a smaller
PID when the ports have the same path cost. The PIDs are compared in the scenario
shown in Figure 1-408. The PID of port A on Device A is smaller than that of port B. In
the BPDUs that are received on port A and port B, the path costs and BIDs of the sending
routers are the same. Therefore, port B with a greater PID is blocked to cut off loops.
Port States
Table 1-112 shows the port status of an STP-enabled router.
A Huawei datacom router uses MSTP by default. Port states supported by MSTP are the
same as those supported by STP/RSTP.
The following parameters affect the STP-enabled port states and convergence.
Hello time
The Hello timer specifies the interval at which an STP-enabled router sends
configuration BPDUs and Hello packets to detect link faults.
Modification of the Hello timer takes effect only if the configuration of the root bridge is
modified. The root bridge adds certain fields in BPDUs to inform non-root bridges of the
change in the interval. After a topology changes, TCN BPDUs will be sent. This interval
is irrelevant to the transmission of TCN BPDUs.
Forward Delay time
The Forward Delay timer specifies the delay for interface status transition. When a link
fault occurs, STP recalculation is performed, causing the structure of the spanning tree to
change. The configuration BPDUs generated during STP recalculation cannot be
immediately transmitted over the entire network. If the root port and designated port
forward data immediately after being selected, transient loops may occur. Therefore, an
interface status transition mechanism is introduced by STP. The newly selected root port
and designated port do not forward data until an amount of time equal to twice the
forward delay has past. In this manner, the newly generated BPDUs can be transmitted
over the network before the newly selected root port and designated port forward data,
which prevents transient loops.
The Forward Delay timer specifies the duration of a port spent in both the Listening and Learning states.
The port in the Listening or Learning state is blocked, which is key to preventing transient loops.
If the configuration BPDU is sent from the root bridge, the value of Message Age is 0. Otherwise, the
value of Message Age indicates the total time during which a BPDU is sent from the root bridge to the
local bridge, including the delay in transmission. In real world situations, each time a configuration
BPDU passes through a bridge, the value of Message Age increases by 1.
Configuration BPDU
Configuration BPDUs are most commonly used.
During initialization, each bridge actively sends configuration BPDUs. After the network
topology becomes stable, only the root bridge actively sends configuration BPDUs. Other
bridges send configuration BPDUs only after receiving configuration BPDUs from upstream
routers. A configuration BPDU is at least 35 bytes long, including the parameters such as the
BID, path cost, and PID. A BPDU is discarded if both the sender BID and Port ID field values
are the same as those of the local port. Otherwise, the BPDU is processed. In this manner,
BPDUs containing the same information as that of the local port are not processed.
Table 1-113 shows the format of a BPDU.
Figure 1-411 shows the Flags field. Only the left most and right most bits are used in STP.
When a root port receives configuration BPDUs, the router where the root port resides
sends a copy of the configuration BPDUs to the specified ports on itself.
When receiving a configuration BPDU with a lower priority, a designated port
immediately sends its own configuration BPDUs to the downstream router.
TCN BPDU
The contents of TCN BPDUs are simple, including only three fields: Protocol ID, Version,
and Type, as shown in Table 1-113. The value of the Type field is 0x80, four bytes in length.
TCN BPDUs are transmitted by each router to its upstream router to notify the upstream
router of changes in the downstream topology, until they reach the root bridge. A TCN BPDU
is generated in one of the following scenarios:
Where the port is in the Forwarding state and at least one designated port resides on the
router
Where a designated port receives TCN BPDUs and sends a copy to the root bridge
As each bridge considers itself the root bridge, the value of the root BID field in the BPDU sent by each
port is recorded as its BID; the value of the Root Path Cost field is the cumulative cost of all links to the
root bridge; the sender BID is the ID of the local bridge; the Port ID is the Port ID (PID) of the local
bridge port that sends the BPDU.
Once a port receives a BPDU with a priority higher than that of itself, the port extracts
certain information from the BPDU and synchronizes its own information with the
obtained information. The port stops sending the BPDU immediately after saving the
updated BPDU.
When sending a BPDU, each router fills in the Sender BID field with its own BID. When
a router considers itself the root bridge, the router fills in the Root BID field with its own
BID. As shown in Figure 1-412, Port B on Device B receives a BPDU with a higher
priority from Device A, and therefore considers Device A the root bridge. When another
port on Device B sends a BPDU, the port fills in its Root BID field with DeviceA_BID.
The preceding intercommunication is repeatedly performed between two routers until all
routers consider the same router as the root bridge. This indicates that the root bridge is
selected. Figure 1-413 shows the root bridge selection.
In the Root Path Cost algorithm, after a port receives a BPDU, the port extracts the value of the Root
Path Cost field, and adds the obtained value and the path cost on the itself to obtain the root path cost.
The path cost on the port covers only directly-connected path costs. The cost can be manually
configured on a port. If the root path costs on two or more ports are the same, the port that sends a
BPDU with the smallest sender BID value is selected as the root port.
1. After the network topology changes, a downstream router continuously sends Topology
Change Notification (TCN) BPDUs to an upstream router.
2. After the upstream router receives TCN BPDUs from the downstream router, only the
designated port processes them. The other ports may receive TCN BPDUs but do not
process them.
3. The upstream router sets the TCA bit of the Flags field in the configuration BPDUs to 1
and returns the configuration BPDUs to instruct the downstream router to stop sending
TCN BPDUs.
4. The upstream router sends a copy of the TCN BPDUs to the root bridge.
5. Steps 1, 2, 3, and 4 are repeated until the root bridge receives the TCN BPDUs.
6. The root bridge sets the TC bit of the Flags field in the configuration BPDUs to 1 to
instruct the downstream router to delete MAC address entries.
TCN BPDUs are used to inform the upstream router and root bridge of topology changes.
Configuration BPDUs with the Topology Change Acknowledgement (TCA) bit being set to 1 are
used by the upstream router to inform the downstream router that the topology changes are known
and instruct the downstream router to stop sending TCN BPDUs.
Configuration BPDUs with the Topology Change (TC) bit being set to 1 are used by the upstream
router to inform the downstream router of topology changes and instruct the downstream router to
delete MAC address entries. In this manner, fast network convergence is achieved.
Figure 1-415 is used as an example to show how the network topology converges when the
root bridge or designated port of the root bridge becomes faulty.
The root bridge becomes faulty.
Figure 1-417 Diagram of topology changes in the case of a faulty root bridge
As shown in Figure 1-417, the root bridge becomes faulty, Device B and Device C will
reselect the root bridge. Device B and Device C exchange configuration BPDUs to select
the root bridge.
The designated port of the root bridge becomes faulty.
Figure 1-418 Diagram of topology changes in the case of a faulty designated port on the root
bridge
As shown in Figure 1-418, the designated port of the root bridge, port 1, becomes faulty.
port6 is selected as the root port through exchanging configuration BPDUs of Device B
and Device C.
In addition, port6 sends TCN BPDUs after entering the forwarding state. Once the root
bridge receives the TCN BPDUs, it will send TC BPDUs to instruct the downstream
router to delete MAC address entries.
Disadvantages of STP
STP ensures a loop-free network but has a slow network topology convergence speed, leading
to service deterioration. If the network topology changes frequently, the connections on the
STP-enabled network are frequently torn down, causing frequent service interruption. Users
can hardly tolerate such a situation.
Disadvantages of STP are as follows:
Port states or port roles are not subtly distinguished, which is not conducive to the
learning and deployment for beginners.
A network protocol that subtly defines and distinguishes different situations is likely to
outperform the others.
− Ports in the Listening, Learning, and Blocking states do not forward user traffic and
are not even slightly different to users.
− The differences between ports in essence never lie in the port states but the port
roles from the perspective of use and configuration.
It is possible that the root port and designated port are both in the Listening state or
Forwarding state.
The STP algorithm determines topology changes after the time set by the timer expires,
which slows down network convergence.
The STP algorithm requires a stable network topology. After the root bridge sends
configuration Bridge Protocol Data Units (BPDUs), other routers forward them until all
bridges on the network receive the configuration BPDUs.
This also slows down topology convergence.
As shown in Figure 1-419, RSTP defines four port roles: root port, designated port,
alternate port, and backup port.
The functions of the root port and designated port are the same as those defined in STP.
The alternate port and backup port are described as follows:
− From the perspective of configuration BPDU transmission:
An alternate port is blocked after learning the configuration BPDUs sent by
other bridges.
A backup port is blocked after learning the configuration BPDUs sent by itself.
− From the perspective of user traffic
An alternate port backs up the root port and provides an alternate path from the
designated bridge to the root bridge.
A backup port backs up the designated port and provides an alternate path
from the root bridge to the related network segment.
Port states and port roles are not necessarily related. Table 1-114 lists states of ports with different roles.
Table 1-114 Comparison between states of STP ports and RSTP ports with different roles
Configuration BPDUs in RSTP are differently defined. Port roles are described based on
the Flags field defined in STP.
Compared with STP, RSTP slightly redefined the format of configuration BPDUs.
− The value of the Type field is no longer set to 0 but 2. Therefore, the RSTP-enabled
router always discards the configuration BPDUs sent by an STP-enabled router.
− The 6 bits in the middle of the original Flags field are reserved. Such a
configuration BPDU is called an RST BPDU, as shown in Figure 1-420.
If the root port fails, the most superior alternate port on the network becomes the
root port and enters the Forwarding state. This is because there must be a path from
the root bridge to a designated port on the network segment connecting to the
alternate port.
When the port role changes, the network topology will change accordingly. For
details, see 1.7.8.2.6 RSTP Implementation.
− Edge ports
In RSTP, a designated port on the network edge is called an edge port. An edge port
directly connects to a terminal and does not connect to any other routers.
An edge port does not receive configuration BPDUs, and therefore does not
participate in the RSTP calculation. It can directly change from the Disabled state to
the Forwarding state without any delay, just like an STP-incapable port. If an edge
port receives bogus BPDUs from attackers, it is deprived of the edge port attributes
and becomes a common STP port. The STP calculation is implemented again,
causing network flapping.
Protection functions
Table 1-115 shows protection functions provided by RSTP.
Root Due to incorrect configurations If a designated port is enabled with the root
protection or malicious attacks on the protection function, the port role cannot be
network, the root bridge may changed. Once a designated port that is
P/A Mechanism
To allow a Huawei device to communicate with a non-Huawei device, a proper rapid
transition mechanism needs to be configured on the Huawei device based on the
Proposal/Agreement (P/A) mechanism on the non-Huawei device.
The P/A mechanism helps a designated port to enter the Forwarding state as soon as possible.
As shown in Figure 1-421, the P/A negotiation is performed based on the following port
variables:
1. proposing: When a port is in the Discarding or Learning state, this variable is set to 1.
Additionally, a Rapid Spanning Tree (RST) BPDU with the Proposal field being 1 is sent
to the downstream router.
2. proposed: After a port receives an RST BPDU with the Proposal field being 1 from the
designated port on the peer router, this variable is set to 1, urging the designated port on
this network segment to enter the Forwarding state.
3. sync: After the proposed variable is set to 1, the root port receiving the proposal sets the
sync variable to 1 for the other ports on the same router; a non-edge port receiving the
proposal enters the Discarding state.
4. synced: After a port enters the Discarding state, it sets its synced variable to 1 in the
following manner: If this port is the alternate, backup, or edge port, it will immediately
set its synced variable to 1. If this port is the root port, it will monitor the synced
variables of the other ports. After the synced variables of all the other ports are set to 1,
the root port sets its synced variable to 1, and sends an RST BPDU with the Agreement
field being 1.
5. agreed: After the designated port receives an RST BPDU with the Agreement field being
1 and the port role field indicating the root port, this variable is set to 1. Once the agreed
variable is set to 1, this designated port immediately enters the Forwarding state.
As shown in Figure 1-422, a new link is established between the root bridges Device A and
Device B. On Device B, p2 is an alternate port; p3 is a designated port in the Forwarding state;
p4 is an edge port. The P/A mechanism works in the following process:
1. p0 and p1 become designated ports and send RST BPDUs.
2. After receiving an RST BPDU with a higher priority, p1 realizes that it will become a
root port but not a designated port, and therefore it stops sending RST BPDUs.
3. p0 enters the Discarding state, and sends RST BPDUs with the Proposal field being 1.
4. After receiving an RST BPDU with the Proposal field being 1, Device B sets the sync
variable to 1 for all its ports.
5. As p2 has been blocked, its status keeps unchanged; p4 is an edge port, and therefore it
does not participate in calculation. Therefore, only the non-edge designated port p3
needs to be blocked.
6. After p2, p3, and p4 enter the Discarding state, their synced variables are set to 1. The
synced variable of the root port p1 is then set to 1, and p1 sends an RST BPDU with the
Agreement field being 1 to Device A. Except for the Agreement field, which is set to 1,
and the Proposal field, which is set to 0, the RST BPDU is the same as that was received.
7. After receiving this RST BPDU, Device A identifies it as a reply to the proposal that it
just sent, and therefore p0 immediately enters the Forwarding state.
This P/A negotiation process finishes, and Device B continues to perform the P/A negotiation
with its downstream router.
Theoretically, STP can quickly select a designated port. To prevent loops, STP has to wait for
a period of time long enough to determine the status of all ports on the network. All ports can
enter the Forwarding state at least one forward delay later. RSTP is developed to eliminate
this bottleneck by blocking non-root ports to prevent loops. By using the P/A mechanism, the
upstream port can rapidly enter the Forwarding state.
To use the P/A mechanism, ensure that the link between the two routersis a point to point (P2P) link in
full-duplex mode. Once the P/A negotiation fails, a designated port can forward traffic only after the
forwarding delay timer expires twice. This delay time is the same as that in STP.
On the network shown in Figure 1-423, STP or RSTP is enabled. The broken line shows the
spanning tree. Device F is the root router. The links between Device A and Device D and
between Device B and Device E are blocked. VLAN packets are transmitted by using the
corresponding links marked with "VLAN2" or "VLAN3."
Host A and Host B belong to VLAN 2 but they cannot communicate with each other because
the link between Device B and Device E is blocked and the link between Device C and
Device F denies packets from VLAN 2.
MSTP divides a switching network into multiple regions, each of which has multiple spanning
trees that are independent of each other. Each spanning tree is called a Multiple Spanning Tree
Instance (MSTI) and each region is called a Multiple Spanning Tree (MST) region.
As shown in Figure 1-424, MSTP maps VLANs to MSTIs in the VLAN mapping table. Each
VLAN can be mapped to only one MSTI. This means that traffic of a VLAN can be
transmitted in only one MSTI. An MSTI, however, can correspond to multiple VLANs.
Two spanning trees are calculated:
MSTI 1 uses Device D as the root router to forward packets of VLAN 2.
MSTI 2 uses Device F as the root router to forward packets of VLAN 3.
In this manner, routers within the same VLAN can communicate with each other; packets of
different VLANs are load balanced along different paths.
MST Region
An MST region contains multiple routers and network segments between them. The routers of
one MST region have the following characteristics:
MSTP-enabled
Same region name
Same VLAN-MSTI mappings
Same MSTP revision level
A LAN can comprise several MST regions that are directly or indirectly connected. Multiple
routers can be grouped into an MST region by using MSTP configuration commands.
As shown in Figure 1-426, the MST region D0 contains Device A, Device B, Device C, and
Device D, and has three MSTIs.
Regional Root
Regional roots are classified as Internal Spanning Tree (IST) and MSTI regional roots.
In the region B0, C0, and D0 on the network shown in Figure 1-428, the routers closest to the
Common and Internal Spanning Tree (CIST) root are IST regional roots.
An MST region can contain multiple spanning trees, each called an MSTI. An MSTI regional
root is the root of the MSTI. On the network shown in Figure 1-427, each MSTI has its own
regional root.
MSTIs are independent of each other. An MSTI can correspond to one or more VLANs, but a
VLAN can be mapped to only one MSTI.
Master Bridge
The master bridge is the IST master, which is the router closest to the CIST root in a region,
for example, Device A shown in Figure 1-426.
If the CIST root is in an MST region, the CIST root is the master bridge of the region.
CIST Root
On the network shown in Figure 1-428, the CIST root is the root bridge of the CIST. The
CIST root is a router in A0.
CST
A Common Spanning Tree (CST) connects all the MST regions on a switching network.
If each MST region is considered a node, the CST is calculated by using STP or RSTP based
on all the nodes.
As shown in Figure 1-428, the MST regions are connected to form a CST.
IST
An IST resides within an MST region.
An IST is a special MSTI with the MSTI ID being 0, called MSTI 0.
An IST is a segment of the CIST in an MST region.
As shown in Figure 1-428, the routers in an MST region are connected to form an IST.
CIST
A CIST, calculated by using STP or RSTP, connects all the routers on a switching network.
As shown in Figure 1-428, the ISTs and the CST form a complete spanning tree, the CIST.
SST
A Single Spanning Tree (SST) is formed in either of the following situations:
A router running STP or RSTP belongs to only one spanning tree.
An MST region has only one router.
As shown in Figure 1-428, the router in B0 forms an SST.
Port Role
Based on RSTP, MSTP has two additional port types. MSTP ports can be root ports,
designated ports, alternate ports, backup ports, edge ports, master ports, and regional edge
port.
The functions of root ports, designated ports, alternate ports, and backup ports have been
defined in RSTP. Table 1-116 lists all port roles in MSTP.
Port Description
Role
Root port A root port is the non-root bridge port closest to the root bridge. Root bridges
do not have root ports.
Root ports are responsible for sending data to root bridges.
As shown in Figure 1-429, Device A is the root; CP1 is the root port on
Device C; BP1 is the root port on Device B.
Designate The designated port on a router forwards BPDUs to the downstream router.
d port As shown in Figure 1-429, AP2 and AP3 are designated ports on Device A;
CP2 is a designated port on Device C.
Alternate From the perspective of sending BPDUs, an alternate port is blocked after a
port BPDU sent by another bridge is received.
From the perspective of user traffic, an alternate port provides an alternate
path to the root bridge. This path is different than using the root port.
Port Description
Role
From the perspective of user traffic, a backup port provides a
backup/redundant path to a segment where a designated port already
connects.
Figure 1-429 Root port, designated port, alternate port, and backup port
There is no necessary link between the port status and the port role. Table 1-118 lists the
relationships between port roles and port status.
The first 36 bytes of an intra-region or inter-region MST BPDU are the same as those of an
RST BPDU.
Fields from the 37th byte of an MST BPDU are MSTP-specific. The field MSTI
Configuration Messages consists of configuration messages of multiple MSTIs.
Table 1-120 lists the major information carried in an MST BPDU.
Figure 1-431 shows the sub-fields in the MST Configuration Identifier field.
Table 1-121 describes the sub-fields in the MST Configuration Identifier field.
Figure 1-432 shows the sub-fields in the MST Configuration Messages field.
Table 1-122 describes the sub-fields in the MSTI Configuration Messages field.
MSTP Principle
In Multiple Spanning Tree Protocol (MSTP), the entire Layer 2 network is divided into
multiple MST regions, which are interconnected by a single Common Spanning Tree (CST).
In a Multiple Spanning Tree (MST) region, multiple spanning trees are calculated, each of
which is called a Multiple Spanning Tree Instances (MSTI). Among these MSTIs, MSTI 0 is
also known as the internal spanning tree (IST). Like STP, MSTP uses configuration messages
to calculate spanning trees, but the configuration messages are MSTP-specific.
Vectors
Both MSTIs and the CIST are calculated based on vectors, which are carried in Multiple
Spanning Tree Bridge Protocol Data Units (MST BPDUs). Therefore, switching devices
exchange MST BPDUs to calculate MSTIs and the Common and Internal Spanning Tree
(CIST).
Vectors are described as follows:
− The following vectors participate in the CIST calculation:
{root ID, external root path cost, region root ID, internal root path cost, designated
switching device ID, designated port ID, receiving port ID}
− The following vectors participate in the MSTI calculation:
{regional root ID, internal root path cost, designated switching device ID,
designated port ID, receiving port ID}
The priorities of vectors in braces are in descending order from left to right.
Table 1-123 describes the vectors.
Root ID Identifies the root switching device for the CIST. The root identifier
consists of the priority value (16 bits) and MAC address (48 bits).
External root path Indicates the path cost from a CIST regional root to the root. ERPCs
cost (ERPC) saved on all switching devices in an MST region are the same. If the
CIST Calculation
After completing the configuration message comparison, the switching device with the
highest priority on the entire network is selected as the CIST root. MSTP calculates an IST for
each MST region, and computes a CST to interconnect MST regions. On the CST, each MST
region is considered a switching device. The CST and ISTs constitute a CIST for the entire
network.
MSTI Calculation
In an MST region, MSTP calculates an MSTI for each VLAN based on mappings between
VLANs and MSTIs. Each MSTI is calculated independently. The calculation process is
similar to the process for STP to calculate a spanning tree. For details, see 1.7.8.2.4 STP
Topology Calculation.
MSTIs have the following characteristics:
The spanning tree is calculated independently for each MSTI, and spanning trees of
MSTIs are independent of each other.
MSTP calculates the spanning tree for an MSTI in the manner similar to STP.
Spanning trees of MSTIs can have different roots and topologies.
Each MSTI sends BPDUs in its spanning tree.
The topology of each MSTI is configured by using commands.
A port can be configured with different parameters for different MSTIs.
A port can play different roles or have different status in different MSTIs.
On an MSTP-aware network, a VLAN packet is forwarded along the following paths:
MSTI in an MST region
CST among MST regions
Background
On the network shown in Figure 1-434:
UPEs are deployed at the aggregation layer, running MSTP.
UPE1 and UPE2 are connected by a Layer 2 link.
Multiple rings are connected to UPE1 and UPE2 through different ports.
The routers on the rings reside at the access layer, running STP or RSTP. In addition,
UPE1 and UPE2 work for different carriers, and therefore they need to reside on
different spanning trees whose topology changes do not affect each other.
On the network shown in Figure 1-434, routers and UPEs construct multiple Layer 2 rings.
STP must be enabled on these rings to prevent loops. UPE1 and UPE2 are connected to
multiple access rings that are independent of each other. The spanning tree protocol cannot
calculate a single spanning tree for all routers. Instead, the spanning tree protocol must be
enabled on each ring to calculate a separate spanning tree.
MSTP supports MSTIs, but these MSTIs must belong to one MST region and routers in the
region must have the same configurations. If the routers belong to different regions, MSTP
calculates the spanning tree based on only one instance. Assume that routers on the network
belong to different regions, and only one spanning tree is calculated in one instance. In this
case, the status change of any router on the network affects the stability of the entire network.
On the network shown in Figure 1-434, the routers connected to UPEs support only STP or
RSTP but not MSTP. When MSTP-enabled UPEs receive RST BPDUs from the routers, the
UPEs consider that they and routers belong to different regions. As a result, only one spanning
tree is calculated for the rings composed of UPEs and routers, and the rings affect each other.
To prevent this problem, MSTP multi-process is introduced. MSTP multi-process is an
enhancement to MSTP. The MSTP multi-process mechanism allows ports on routers to be
bound to different processes. MSTP calculation is performed based on processes. In this
manner, only ports that are bound to a process participate in the MSTP calculation for this
process. With the MSTP multi-process mechanism, spanning trees of different processes are
calculated independently and do not affect each other. The network shown in Figure 1-434
can be divided into multiple MSTP processes by using MSTP multi-process. Each process
takes charge of a ring composed of routers. The MSTP processes have the same functions and
support MSTIs. The MSTP calculation for one process does not affect the MSTP calculation
for another process.
Purpose
On the network shown in Figure 1-434, MSTP multi-process is configured to implement the
following:
Greatly improves applicability of STP to different networking conditions.
To help a network running different spanning tree protocols run properly, you can bind
the routers running different spanning tree protocols to different processes. In this
manner, every process calculates a separate spanning tree.
Improves the networking reliability. For a network composed of many Layer 2 access
devices, using MSTP multi-process reduces the adverse effect of a single node failure on
the entire network.
The topology is calculated for each process. If a device fails, only the topology
corresponding to the process to which the device belongs changes.
Reduces the network administrator workload during network expansion, facilitating
operation and maintenance.
To expand a network, you only need to configure new processes, connect the processes
to the existing network, and keep the existing MSTP processes unchanged. If device
expansion is performed in a process, only this process needs to be modified.
Implements separate Layer 2 port management
An MSTP process manages parts of ports on a router. Layer 2 ports on a router are
separately managed by multiple MSTP processes.
Principles
Public link status
As shown in Figure 1-434, the public link between UPE1 and UPE2 is a Layer 2 link
running MSTP. The public link between UPE1 and UPE2 is different from the links
connecting routers to UPEs. The ports on the public link need to participate in the
calculation for multiple access rings and MSTP processes. Therefore, the UPEs must
identify the process from which MST BPDUs are sent.
In addition, a port on the public link participates in the calculation for multiple MSTP
processes, and obtains different status. As a result, the port cannot determine its status.
To prevent this situation, it is defined that a port on a public link always adopts its status
in MSTP process 0 when participating in the calculation for multiple MSTP processes.
After a routers normally starts, MSTP process 0 exists by default, and MSTP configurations in the
system view and interface view belong to this process.
Reliability
On the network shown in Figure 1-435, after the topology of a ring changes, the MSTP
multi-process mechanism helps UPEs flood a TC packet to all routers on the ring and
prevent the TC packet from being flooded to routers on the other ring. UPE1 and UPE2
update MAC and ARP entries on the ports corresponding to the changed spanning tree.
On the network shown in Figure 1-436, if the public link between UPE1 and UPE2 fails,
multiple routers that are connected to the UPEs will unblock their blocked ports.
Assume that UPE1 is configured with the highest priority, UPE2 with the second highest
priority, and routers with default or lower priorities. After the link between UPE1 and
UPE2 fails, the blocked ports (replacing the root ports) on routers no longer receive
packets with higher priorities and re-performs state machine calculation. If the
calculation changes the blocked ports to designated ports, a permanent loop occurs, as
shown in Figure 1-437.
Solutions
To prevent a loop between access rings, use either of the following solutions:
− Configure root protection between UPE1 and UPE2.
If all physical links between UPE1 and UPE2 fail, configuring an inter-board
Eth-Trunk link cannot prevent the loop. Root protection can be configured to
prevent the loop shown in Figure 1-437.
Use the blue ring shown in Figure 1-438 as an example. UPE1 is configured with
the highest priority, UPE2 with the second highest priority, and routers on the blue
ring with default or lower priorities. In addition, root protection is enabled on
UPE2.
Assume that a port on S1 is blocked. When the public link between UPE1 and
UPE2 fails, the blocked port on S1 begins to calculate the state machine because it
no longer receives BPDUs of higher priorities. After the calculation, the blocked
port becomes the designated port and performs P/A negotiation with the
downstream router.
After S1, which is directly connected to UPE2, sends BPDUs of higher priorities to
the UPE2 port enabled with root protection, the port is blocked. From then on, the
port remains blocked because it continues receiving BPDUs of higher priorities. In
this manner, no loop will occur.
Unless otherwise specified, STP in this document includes STP defined in IEEE 802.1D, RSTP defined
in IEEE 802.1W, and MSTP defined in IEEE 802.1S.
On the network shown in Figure 1-439, users access the VPLS network through a ring
network that is comprised of CE1, CE2, PE1, and PE2. The PEs are fully connected on
the VPLS network. The packet forwarding process is as follows (using the forwarding of
broadcast or unknown unicast packets from CE1 as an example):
a. After CE1 receives a broadcast or unknown unicast packet, it forwards the packet to
both PE1 and CE2.
b. After PE1 (CE2) receives the packet, it cannot find the outbound interface based on
the destination MAC address of the packet, and therefore broadcasts the packet.
c. After PE2 receives the packet, it also broadcasts the packet. Because PEs do not
forward data received from a PW back to the PW, PE2 (PE1) sends the packet to a
CE and the remote PE.
As a result, a loop occurs on the path CE1 -> CE2 -> PE2 -> PE1 -> CE1 or the path
CE1 -> PE1 -> PE2 -> CE2 -> CE1. The CEs and PEs all receive duplicate traffic.
Solution
To address this problem, enable STP on CE1, CE2, PE1, and PE2; deploy an mPW
between PE1 and PE2, deploy a service PW between PE1 and the PE and between PE2
and the PE, and associate service PWs with the mPW; enable MSTP for the mPW and
AC interfaces so that the mPW can participate in STP calculation and block a CE
interface to prevent duplicate traffic. In addition, configure PE1 and PE2 as the root
bridge and secondary root bridge so that the blocked port resides on the link between the
CEs.
As shown in Figure 1-440, STP is enabled globally on PE1, PE2, CE1, and CE2; an
mPW is deployed between PE1 and PE2; STP is enabled on GE 1/0/1 on PE1 and PE2
and on GE 1/0/1 and GE 1/0/2 on CE1 and CE2. PE2 is configured as the primary root
bridge and PE1 is configured as the secondary root bridge (determined by the bridge
priority) to block the port connecting CE2 to CE1. After STP calculation and association
between the mPW and service PWs are implemented, remote devices no longer receive
duplicate traffic.
Reliability
On the network shown in Figure 1-441 the mPW does not detect a fault on the link
between the PE and PE2 because PE1 is reachable to the PE and a new service PW can
be created. In addition, the STP topology remains unchanged, and therefore the blocked
port is unchanged and STP recalculation is not required.
If the STP topology changes, each node sends a TCN BPDU to trigger the updating of
local MAC address entries. In addition, the TCN BPDU triggers the PW to send MAC
Withdraw packets to instruct the remote device to update the learned MAC address
entries locally. In this manner, traffic is switched to an available link.
As shown in Figure 1-442, if the mPW between PE1 and PE2 fails, the ring network
topology is recalculated, and the blocked port on CE2 is unblocked and enters the
Forwarding state. In this situation, the remote PE receives permanent duplicate packets.
To resolve this problem, configure root protection on the secondary root bridge PE1's GE
1/0/1 connecting to CE1. As shown in Figure 1-443, if the mPW between PE1 and PE2
fails, PE1's GE 1/0/1 is blocked because it receives BPDUs with higher priorities. As the
link along the path PE1 -> CE1 -> CE2 -> PE2 is working properly, PE1's blocked port
keeps receiving BPDUs with higher priorities, and therefore this port remains in the
blocked state. This prevents the remote PE from receiving duplicate traffic.
Load balancing
As shown in Figure 1-444, MSTP is enabled for ports connecting PEs and CEs, for the
mPW between PE1 and PE2, and for ports connecting CE1 and CE2. MSTP is globally
enabled on PE1, PE2, CE1, and CE2. After PE1 is configured as the primary root bridge
and PE2 is configured as the backup root bridge (determined by bridge priority), MSTP
calculation is performed to block the port connecting CE1 and CE2. A mapping is
configured between VLANs and MSTIs to implement load balancing.
Option A problem
In inter-AS VPLS Option A mode, redundant connections are established between ASs,
and broadcast and unknown unicast packets may be forwarded in a loop. As shown in
Figure 1-445, VPLS#AS1 and VPLS#AS2 are connected by two links to improve
reliability. After Option A is adopted, fully connected PWs between PEs and ASBRs in
an AS are configured with split horizon to prevent loops, but broadcast and unknown
unicast packets are looped between ASBRs. PEs receive duplicate packets even if
ASBRs in a VPLS AS are not connected.
Dual protection of Option A
To resolve inter-AS loops, configure STP on ASBRs between ASs to break off the loops,
as shown in Figure 1-446. STP is running on Layer 2 ports, so Layer 2 links are required.
If Layer 2 links do not exist between ASBRs, PWs or Layer 3 ports must be added. STP
blocks a link on the inter-AS ring network to prevent broadcast and unknown unicast
packets from being forwarded in a loop and the remote PE from receiving duplicate
traffic.
− As shown in Figure 1-449, PWs between ASBRs are fully connected. By using the
MSTP multi-process feature, E-STP associates mPWs with MSTP processes.
Processes are independent of each other, and therefore the mPWs are independent
of each other. Multiple service PWs are associated with an mPW. After the mPW is
blocked, the associated service PWs are also blocked. This helps break off the loop
between VPLS ASs and perform load balancing by blocking an interface as
required.
1.7.8.5 Applications
1.7.8.5.1 STP Application
On a complex network, loops are inevitable. With the requirement for network redundancy
backup, network designers tend to deploy multiple physical links between two devices, one of
which is the master and the others are the backup. Loops are likely or bound to occur in such
a situation. Loops can cause flapping of MAC address tables and therefore damages MAC
address entries.
On the network shown in Figure 1-450, after CE and PE running STP discover loops on the
network by exchanging information with each other, they trim the ring topology into a
loop-free tree topology by blocking a certain port. In this manner, replication and circular
propagation of packets are prevented on the network and the switching devices are released
from processing duplicated packets, thereby improving their processing performance.
Terms
Term Definition
STP Spanning Tree Protocol. A protocol used in the local area network (LAN)
to eliminate loops. Devices running STP discover loops in the network by
exchanging information with each other, and block certain interfaces to
eliminate loops.
RSTP Rapid Spanning Tree Protocol. A protocol which is given detailed
description by the IEEE 802.1w. Based on STP, RSTP modifies and
supplements to STP, and is therefore able to implement faster
convergence than STP.
MSTP Multiple Spanning Tree Protocol. A new spanning tree protocol defined in
IEEE 802.1s that introduces the concepts of region and instance. To meet
different requirements, MSTP divides a large network into regions where
multiple spanning tree instances (MSTIs) are created. These MSTIs are
mapped to virtual LANs (VLANs) and bridge protocol data units
(BPDUs) carrying information about regions and instances are transmitted
Term Definition
between network bridges, and therefore, a network bridge can know
which region itself belongs to based on the BPDU information.
Multi-instance RSTP is run within regions, whereas RSTP-compatible
protocols are run between regions.
VLAN Virtual Local Area Network. A switched network and an end-to-end
logical network that is constructed by using the network management
software across different network segments and networks. A VLAN forms
a logical subnet, that is, a logical broadcast domain. One VLAN can
include multiple network devices.
Definition
Ethernet Ring Protection Switching (ERPS) is a protocol defined by the International
Telecommunication Union - Telecommunication Standardization Sector (ITU-T) to prevent
loops at Layer 2. As the standard number is ITU-T G.8032/Y1344, ERPS is also called G.8032.
ERPS defines Ring Auto Protection Switching (RAPS) Protocol Data Units (PDUs) and
protection switching mechanisms. It can be used for communication between Huawei and
non-Huawei devices on a ring network.
ERPSv1 and ERPSv2 are currently available. ERPSv1 was released by the ITU-T in June
2008, and ERPSv2 was released by the ITU-T in August 2010. ERPSv2, fully compatible
with ERPSv1, extends ERPSv1 functions. Table 1-124 compares ERPSv1 and ERPSv2.
Ring type Supports single rings only. Supports single rings and
multi-rings. A multi-ring
topology comprises major rings
and sub-rings.
Port role Supports the RPL owner port Supports the RPL owner port,
configuration and ordinary ports. RPL neighbor port, and
ordinary ports.
Topology change Not supported. Supported.
notification
R-APS PDU Not supported. Supported.
transmission modes
on sub-rings
Revertive and Supports revertive switching by Supported.
non-revertive default and does not support
switching non-revertive switching or
switching mode configuration.
Manual port blocking Not supported. Supports forced switch (FS) and
manual switch (MS).
As ERPSv2 is fully compatible with ERPSv1, configuring ERPSv2 is recommended if all devices on an
ERPS ring support both ERPSv1 and ERPSv2.
Purpose
Generally, redundant links are used on an Ethernet switching network to provide link backup
and enhance network reliability. The use of redundant links, however, may produce loops,
causing broadcast storms and rendering the MAC address table unstable. As a result, the
communication quality deteriorates, and communication services may even be interrupted. To
resolve these problems, ERPS can be used for loop avoidance purposes.
ERPS blocks the ring protection link (RPL) owner port to remove loops and unblocks it if a
link fault occurs to promptly restore communication.
Table 1-125 compares various ring network protocols.
Benefits
This feature offers the following benefits:
Protects services and prevents broadcast storms on ring networks.
Meets carrier-class reliability requirements for network convergence.
Allows communication between Huawei and non-Huawei devices on ring networks.
1.7.9.2 Principles
1.7.9.2.1 Basic Concepts
Introduction
Ethernet Ring Protection Switching (ERPS) is a protocol used to block specified ports to
prevent loops at the link layer of an Ethernet network.
On the network shown in Figure 1-454, Device A through Device D constitute a ring and are
dual-homed to an upstream IP/MPLS network. This access mode will cause a loop on the
entire network. To eliminate redundant links and ensure link connectivity, ERPS is used to
prevent loops.
Figure 1-454 shows a typical ERPS single-ring network. The following describes ERPS based
on this networking:
ERPS Ring
An ERPS ring consists of interconnected switches that have the same control VLAN. A ring is
a basic ERPS unit.
ERPS rings are classified as major rings (closed) or sub-rings (open). On the network shown
in Figure 1-455, Device A through Device D constitute a major ring, and Device C through
Device F constitute a sub-ring.
Only ERPSv2 supports sub-rings.
Node
A node refers to a switch added to an ERPS ring. A node can have a maximum of two ports
added to the same ERPS ring. Device A through Device D in Figure 1-454 are nodes on an
ERPS major ring.
Port Role
ERPS defines three port roles: ring protection link (RPL) owner port, RPL neighbor port (only
in ERPSv2), and ordinary port.
RPL owner port
An RPL owner port is a ring port responsible for blocking traffic over the RPL to prevent
loops. An ERPS ring has only one RPL owner port.
When the node on which the RPL owner port resides receives an R-APS PDU indicating
that a link or node on the ring fails, it unblocks the RPL owner port to allow the port to
send and receive traffic. This process ensures that traffic is not interrupted.
RPL neighbor port
An RPL neighbor port is a ring port directly connected to an RPL owner port and is used
to reduce the number of times that filtering database (FDB) entries are refreshed.
RPL owner and neighbor ports are both blocked under normal conditions to prevent
loops.
If an ERPS ring fails, both RPL owner and neighbor ports are unblocked.
Ordinary port
Ordinary ports are ring ports other than the RPL owner and neighbor ports.
An ordinary port monitors the status of the directly connected ERPS link and sends
R-APS PDUs to inform the other ports if the link status changes.
Port Status
On an ERPS ring, an ERPS-enabled port can be in either of the following states:
Forwarding: The port forwards user traffic and sends and receives R-APS PDUs.
Discarding: The port does not forward user traffic but can receive and send ERPS R-APS
PDUs.
Control VLAN
A control VLAN is configured for an ERPS ring to transmit R-APS PDUs. Each ERPS ring
must be configured with a control VLAN. After a port is added to an ERPS ring that has a
control VLAN configured, the port is added to the control VLAN automatically. Different
ERPS rings cannot be configured with the same control VLAN ID.
Unlike control VLANs, data VLANs are used to transmit data packets.
ERP Instance
On a router running ERPS, the VLAN in which R-APS PDUs and data packets are
transmitted must be mapped to an Ethernet Ring Protection (ERP) instance so that ERPS
forwards or blocks the VLAN packets based on blocking rules. Otherwise, VLAN packets
will probably cause broadcast storms on the ring network and render the network unavailable.
Timer
ERPS defines four timers: guard timer, WTR timer, hold-off timer, and WTB timer (only in
ERPSv2).
Guard timer
After a faulty link or node recovers or a clear operation is executed, the nodes on the two
ends of the link or the recovered node sends R-APS No Request (NR) messages to
inform the other nodes of the link or node recovery and starts a guard timer. Before the
timer expires, each involved node does not process any R-APS PDUs to avoid receiving
out-of-date R-APS (SF) messages. After the timer expires, if the involved node still
receives an R-APS (SF) message, the local port enters the Forwarding state.
WTR Timer
If the RPL owner port is unblocked due to a link or node failure, the involved port may
not go Up immediately after the link or node recovers. To prevent the RPL owner port
from alternating between Up and Down, the node where the RPL owner port resides
starts a WTR timer after receiving an R-APS (NR) message. If the node receives an
R-APS Signal Fail (SF) message before the timer expires, it terminates the WTR timer
(R-APS SF message: a message sent by a node to other nodes after the node in an ERPS
ring detects that one of its ring ports becomes Down). If the node does not receive any
R-APS (SF) message before the timer expires, it blocks the RPL owner port when the
timer expires and sends an R-APS (NR, RB) message. After receiving this R-APS (NR,
RB) message, the nodes set their recovered ports on the ring to the Forwarding state.
Hold-off timer
Protection switching sequence requirements vary for Layer 2 networks running ERPS.
For example, in a multi-layer service application, a certain period of time is required for
a server to recover should it fail. (During this period, no protection switching is
performed, and the client does not detect the failure.) A hold-off timer can be set to
ensure that the server is given adequate time to recover. If a fault occurs, the fault is not
immediately reported to ERPS. Instead, the hold-off timer starts. If the fault persists after
the timer expires, the fault will be reported to ERPS.
WTB timer
The WTB timer starts after an FS or MS operation is performed. When multiple nodes
on an ERPS ring are in the FS or MS state, the clear operation takes effect only after the
WTB timer expires. This ensures that the RPL owner port will not be blocked
immediately.
The WTB timer value cannot be configured. Its value is the guard timer value plus 5.
By default, sub-rings use NVCs to transmit R-APS PDUs, except for the scenario shown in
Figure 1-457.
When sub-ring links are not contiguous, VCs must be used. On the network shown in Figure 1-457,
links b and d belong to major rings 1 and 2, respectively; links a and c belong to the sub-ring. Because
links a and c are not contiguous, they cannot detect the status change between each other. Therefore,
VCs must be used for R-APS PDU transmission.
Table 1-126 lists the advantages and disadvantages of R-APS PDU transmission modes on
sub-rings with VCs or NVCs.
Table 1-126 Comparison between R-APS PDU transmission modes on sub-rings with VCs or
NVCs
MEL 3 bits Identifies the maintenance entity group (MEG) level of the
R-APS PDU.
Version 5 bits 0x00: used in ERPSv1.
0x01: used in ERPSv2.
OpCode 8 bits Indicates an R-APS PDU. The value of this field is 0x28.
Flags 8 bits Is reserved. The value of this field is fixed at 0x00.
TLV Offset 8 bits Indicates that the TLV starts after an offset of 32 bytes. The
value of this field is fixed at 0x20.
R-APS Specific 32 x 8 Carries R-APS ring information and is the core in an R-APS
Information bits PDU. This field has different meanings for some of its
sub-fields in ERPSv1 and ERPSv2. Figure 1-459 shows the
R-APS Specific Information field format in ERPSv1. Figure
1-460 shows the R-APS Specific Information field format in
ERPSv2.
TLV Not Describes information to be loaded. The end TLV value is
limite 0x00.
d
Node ID 6 x 8 bits Identifies the MAC address of a node on the ERPS ring. It is
informational and does not affect protection switching on the
ERPS ring.
Reserved 2 24 x 8 bits Reserved for future extension and should be ignored upon
reception. Currently, this sub-field should be encoded as all 0s
in transmission.
A Link Fails
As shown in Figure 1-462, if the link between Device D and Device E fails, the ERPS
protection switching mechanism is triggered. The ports on both ends of the faulty link are
blocked, and the RPL owner port and RPL neighbor port are unblocked to send and receive
traffic. This mechanism ensures that traffic is not interrupted. The process is as follows:
1. After Device D and Device E detect the link fault, they block their ports on the faulty
link and perform a Filtering Database (FDB) flush.
2. Device D and Device E send three consecutive R-APS Signal Fail (SF) messages to the
other LSWs and then send one R-APS (SF) message at an interval of 5s afterwards.
3. After receiving an R-APS (SF) message, the other LSWs perform an FDB flush. Device
C on which the RPL owner port resides and Device B on which the RPL neighbor port
resides unblock the respective RPL owner port and RPL neighbor port, and perform an
FDB flush.
Figure 1-462 ERPS single ring networking (unblocking the RPL owner port and RPL neighbor
port if a link fails)
3. After receiving an R-APS (NR, RB) message, Device D and Device E unblock the ports
at the two ends of the link that has recovered, stop sending R-APS (NR) messages, and
perform an FDB flush. The other LSWs also perform an FDB flush after receiving an
R-APS (NR, RB) message.
Protection Switching
Forced switch
On the network shown in Figure 1-463, Device A through Device E on the ERPS ring
can communicate with each other. A forced switch (FS) operation is performed on the
Device E's port that connects to Device D, and the Device E's port is blocked. The RPL
owner port and RPL neighbor port are then unblocked to send and receive traffic. This
ensures that traffic is not interrupted. The process is as follows:
a. After the Device E's port that connects to Device D is forcibly blocked, Device E
performs an FDB flush.
b. Device E sends three consecutive R-APS (SF) messages to the other LSWs and then
after 5s, sends another R-APS (SF) message.
c. After receiving an R-APS (SF) message, the other LSWs perform an FDB flush.
Device C on which the RPL owner port resides and Device B on which the RPL
neighbor port resides unblock the respective RPL owner port and RPL neighbor
port, and perform an FDB flush.
Clear
After a clear operation is performed on Device E, the port that is forcibly blocked by FS
sends R-APS (NR) messages to all other ports on the ERPS ring.
− If the ERPS ring uses revertive switching, the RPL owner port starts the WTB timer
after receiving an R-APS (NR) message. After the WTB timer expires, the FS
operation is cleared. The RPL owner port is then blocked, and the blocked port on
Device E is unblocked. If you perform a clear operation on Device C on which the
RPL owner port resides before the WTB timer expires, the RPL owner port is
immediately blocked, and the blocked port on Device E is unblocked.
− If the ERPS ring uses non-revertive switching and you want to block the RPL
owner port, perform a clear operation on Device C on which the RPL owner port
resides.
Manual switch
Compared with an FS operation, a manual switch (MS) operation triggers protection
switching in a similar way except that an MS operation does not take effect in FS, MS,
or link failure conditions.
A Link Fails
As shown in Figure 1-465, if the link between Device D and Device G fails, the ERPS
protection switching mechanism is triggered. The ports on both ends of the faulty link are
blocked, and the RPL owner port on sub-ring 2 is unblocked to send and receive traffic. In this
situation, traffic from PC1 still travels along the original path. Device C and Device D inform
the other nodes on the major ring of the topology change so that traffic from PC2 is also not
interrupted. Traffic between PC2 and the upper-layer network travels along the path PC2 <->
Device G <-> Device C <-> Device B <-> Device A <-> Device E <-> PE2. The process is as
follows:
1. After Device D and Device G detect the link fault, they block their ports on the faulty
link and perform a Filtering Database (FDB) flush.
2. Device G sends three consecutive R-APS (SF) messages to the other LSWs and then
sends one R-APS (SF) message at an interval of 5s afterwards.
3. Device G then unblocks the RPL owner port and performs an FDB flush.
4. After the interconnection node Device C receives an R-APS (SF) message, it performs
an FDB flush. Device C and Device D then send R-APS Event messages within the
major ring to notify the topology change in sub-ring 2.
5. After receiving an R-APS Event message, the other LSWs on the major ring perform an
FDB flush.
Then traffic from PC2 is switched to a normal link.
Figure 1-465 ERPS multi-ring networking (unblocking the RPL owner port if a link fails)
If the ERPS ring uses non-revertive switching, the RPL remains unblocked, and the link
that has recovered remains blocked.
The following example uses revertive switching to describes the process after the link
recovers.
1. After the link between Device D and Device G recovers, Device D and Device G start a
guard timer to avoid receiving out-of-date R-APS PDUs. The two routers do not receive
any R-APS PDUs before the timer expires. Then Device D and Device G send R-APS
(NR) messages within sub-ring 2.
2. Device G on which the RPL owner port resides starts the WTR timer. After the WTR
timer expires, Device G blocks the RPL owner port and unblocks its port on the link that
has recovered and then sends R-APS (NR, RB) messages within sub-ring 2.
3. After receiving an R-APS (NR, RB) message from Device G, Device D unblocks its port
on the recovered link, stops sending R-APS (NR) messages, and performs an FDB flush.
Device C also performs an FDB flush.
4. Device C and Device D, the interconnection nodes, then send R-APS Event messages
within the major ring to notify the link recovery of sub-ring 2.
5. After receiving an R-APS Event message, the other LSWs on the major ring perform an
FDB flush.
Then traffic changes to the normal state, as shown in Figure 1-464.
On the network shown in Figure 1-467, Device A, Device B, and Device C form an ERPS
ring. Three relay nodes exist between Device A and Device C. Ethernet CFM is configured on
Device A and Device C. Interface 1 on Device A is associated with Interface 1 on Relay 1, and
Interface 1 on Device C is associated with Interface 1 on Relay 3.
In normal situations, the RPL owner port sends R-APS (NR) messages to all other nodes on
the ring at an interval of 5s, indicating that ERPS links are normal.
Figure 1-467 ERPS ring over transmission links (links are normal)
If Relay 2 fails, Device A and Device C detect the Ethernet CFM failure, block their Interface
1, send R-APS (SF) messages through their respective interfaces connected to Device B, and
then perform a Filtering Database (FDB) flush.
After receiving an R-APS (SF) message, Device B unblocks the RPL owner port and
performs an FDB flush. Figure 1-468 shows the networking after Relay 2 fails.
After Relay 2 recovers, Relay2 in revertive switching mode re-blocks the RPL owner port and
sends R-APS (NR, RB) messages.
After Device A and Device C receive an R-APS (NR, RB) message, Device A and Device C
unblock their blocked Interface 1 and perform an FDB flush so that traffic changes to the
normal state, as shown in Figure 1-467.
1.7.9.3 Applications
1.7.9.3.1 ERPS Layer 2 Protocol Tunneling Application
Redundant links are used on an Ethernet switching network to provide link backup and
enhance network reliability. The use of redundant links, however, may produce loops, causing
broadcast storms and rendering the MAC address table unstable. As a result, the
communication quality deteriorates, and communication services may be interrupted.
To prevent loops caused by redundant links, enable ERPS on the nodes of the ring network.
ERPS is a Layer 2 loop-breaking protocol defined by the ITU-T. It boasts of fast convergence,
implementing convergence within 50 ms.
On the network shown in Figure 1-469, Device A through Device E constitute a major ring;
Device A, Device C, and Device F constitute a sub-ring; Device C, Device D, and Device G
constitute another sub-ring. The ERPS ring network resides at the aggregation layer, and
therefore is an aggregation ring. The aggregation ring aggregates Layer 2 services to the
upstream Layer 3 network, providing Layer 2 protection switching. VLANIF interfaces are
configured on Device A and Device E for Layer 3 access. VRRP is configured on the
VLANIF interfaces to implement the virtual gateway function, and peer BFD is enabled for
fast fault detection and accordingly fast VRRP switching.
If ERPS multi-instances are configured, ERPS is implemented in the same manner as that in
Figure 1-469, except that two logical ERPS rings are configured on the physical ring in Figure
1-469, and each logical ERPS ring has its switches, port roles, and control VLANs
independently configured.
Terms
Term Description
FDB Filtering database. A collection of entries for guiding data forwarding. There are
Layer 2 FDB and Layer 3 FDB. The Layer 2 FDB refers to the MAC table,
which provides information about MAC addresses and outbound interfaces and
guides Layer 2 forwarding. The Layer 3 FDB refers to the ARP table, which
Term Description
provides information about IP addresses and outbound interfaces and guides
Layer 3 forwarding.
MSTP Multiple Spanning Tree Protocol. A new spanning tree protocol defined in IEEE
802.1s. MSTP uses the concepts of region and instance. Based on different
requirements, MSTP divides a large network into regions where instances are
created. These instances are mapped to VLANs. BPDUs with region and
instance information are transmitted between bridges. A bridge determines
which domain it belongs to based on the information carried in BPDUs.
RSTP Rapid Spanning Tree Protocol. A protocol defined in IEEE 802.1w which is
released in 2001. RSTP is the amendment and supplementation to STP,
implementing rapid convergence.
STP Spanning Tree Protocol (STP). A protocol defined in IEEE 802.1d which is
released in 1998. This protocol is used to eliminate loops on a LAN. The routers
running STP detect loops on the network by exchanging information with each
other, and block specified interfaces to eliminate loops.
Definition
MAC flapping-based loop detection is a method for detecting Ethernet loops based on the
frequency of MAC address entry flapping.
Purpose
Generally, redundant links are used on an Ethernet network to provide link backup and
enhance network reliability. Redundant links, however, may produce loops and cause
broadcast storms and MAC address entry flapping. As a result, the communication quality
deteriorates, and communication services may even be interrupted. To eliminate loops on the
network, the spanning tree protocols or Layer 2 loop detection technology was introduced. If
you want to apply a spanning tree protocol, the protocol must be supported and you need to
configure it on each user network device. If you want to apply the Layer 2 loop detection
technology, user network devices must allow Layer 2 loop detection packets to pass.
Therefore, the spanning tree protocols or the Layer 2 loop detection technology cannot be
used to eliminate loops on user networks with unknown connections or user networks that do
not support the spanning tree protocols or Layer 2 loop detection technology.
MAC flapping-based loop detection is introduced to address this problem. It does not require
protocol packet negotiation between devices. A device independently checks whether a loop
occurs on the network based on MAC address entry flapping.
Devices can block redundant links based on the frequency of MAC address entry flapping to
eliminate loops on the network.
Benefits
This feature offers the following benefits to carriers:
Eliminates loops on a network of any topology.
Prevents broadcast storms and provides timely and reliable communication.
1.7.10.2 Principles
1.7.10.2.1 Principles of MAC Flapping-based Loop Detection
MAC flapping-based loop detection is a method for detecting Ethernet loops based on the
frequency of MAC address entry flapping. It eliminates loops on networks by blocking
redundant links. On a virtual private LAN service (VPLS) network, MAC flapping-based loop
detection can be applied to block attachment circuit (AC) interfaces and pseudo wires (PWs).
This section describes AC interface blocking.
On the network shown in Figure 1-470, the consumer edge (CE) is dual-homed to the
provider edges (PEs) of the Ethernet network. To avoid loops and broadcast storms, deploy
MAC flapping-based loop detection on PE1, PE2, and the CE. For example, when receiving
user packets from the CE, PE1 records in its MAC address table the CE MAC address as the
source MAC address and port1 as the outbound interface. When PE1 receives packets
forwarded by PE2 from the CE, the source MAC address of the packets remains unchanged,
but the outbound interface changes. In this case, PE1 updates the CE's MAC address entry in
its MAC address table. Because PE1 repeatedly receives user packets with the same source
MAC address through different interfaces, PE1 constantly updates the MAC address entry. In
this situation, with MAC flapping-based loop detection, PE1 detects the MAC address
flapping and concludes that a loop has occurred. PE1 then blocks its port1 and generates an
alarm, or it just generates an alarm, depending on user configurations.
After MAC flapping-based loop detection is configured on a device and the device receives
packets with fake source MAC addresses from attackers, the device may mistakenly conclude
that a loop has occurred and block an interface based on the configured blocking policy.
Therefore, key user traffic may be blocked. It is recommended that you disable MAC
flapping-based loop detection on properly running devices. If you have to use MAC
flapping-based loop detection to detect whether links operate properly during site deployment,
be sure to disable this function after this stage.
The basic concepts for MAC flapping-based loop detection are as follows:
Detection cycle
If a device detects a specified number of MAC address entry flaps within a detection
cycle, the device concludes that a loop has occurred. The detection cycle is configurable.
Temporary blocking
If a device concludes that a loop has occurred, it blocks an interface or PW for a
specified period of time.
Permanent blocking
After an interface or a PW is blocked and then unblocked, if the total number of times
that loops occur exceeds the configured maximum number, the interface or PW is
permanently blocked.
1.7.10.3 Applications
1.7.10.3.1 MAC Flapping-based Loop Detection for VPLS Networks
On the virtual private LAN service (VPLS) network shown in Figure 1-471, pseudo wires
(PWs) are established over Multiprotocol Label Switching (MPLS) tunnels between virtual
private network (VPN) sites to transparently transmit Layer 2 packets. When forwarding
packets, the provider edges (PEs) learn the source MAC addresses of the packets, create MAC
address entries, and establish mapping between the MAC addresses and AC interfaces and
mapping between the MAC addresses and PWs.
Figure 1-471 VPLS network with MAC flapping-based loop detection enabled
On the network shown in Figure 1-471, CE2 and CE3 are connected to PE1 to provide
redundant links. This deployment may generate loops because the connections on the user
network of CE2 and CE3 are unknown. Specifically, if CE2 and CE3 are connected, PE1
interfaces connected to CE2 and CE3 may receive user packets with the same source MAC
address, causing MAC address entry flapping or even damaging MAC address entries. In this
situation, you can deploy MAC flapping-based loop detection on PE1 and configure a
blocking policy for AC interfaces to prevent such loops. The blocking policy can be either of
the following:
Blocking interfaces based on their blocking priorities: If a device detects a loop, it blocks
the interface with a lower blocking priority.
Blocking interfaces based on their trusted or untrusted states: If a device detects a loop, it
blocks the untrusted interface.
MAC flapping-based loop detection can also detect PW-side loops. The principles of blocking
PWs are similar to those of blocking AC interfaces.
In addition, MAC flapping-based loop detection can associate an interface with its
sub-interfaces bound with virtual switching instances (VSIs). If a loop occurs in the VSI
bound to a sub-interface, the sub-interface is blocked. However, a loop may also exist in a
VSI bound to another sub-interface. If the loop is not eliminated in time, it will cause traffic
congestion or even a network breakdown. To inform the network administrator of loops,
enable MAC flapping-based loop detection association on the interface of the sub-interfaces
bound with VSIs. In this situation, if a sub-interface bound with a VSI is blocked due to a
loop, the interface on which the sub-interface is configured is also blocked and an alarm is
generated. After that, all the other sub-interfaces bound with VSIs are blocked.
Terms
None
AC attachment circuit
MAC Media Access Control
PW pseudo wire
STP Spanning Tree Protocol
VPLS virtual private LAN service
VSI virtual switching instance
1.7.11 VXLAN
1.7.11.1 Introduction
Definition
Virtual extensible local area network (VXLAN) is a Network Virtualization over Layer 3
(NVO3) technology that uses MAC-in-UDP encapsulation.
Purpose
As a widely deployed core cloud computing technology, server virtualization greatly reduces
IT and O&M costs and improves service deployment flexibility.
On the network shown in Figure 1-472, a server is virtualized into multiple virtual machines
(VMs), each of which functions as a host. A great increase in the number of hosts causes the
following problems:
VM scale is limited by the network specification.
On a legacy large Layer 2 network, data packets are forwarded at Layer 2 based on MAC
entries. However, there is a limit on the MAC table capacity, which subsequently limits
the number of VMs.
Network isolation capabilities are limited.
Most networks currently use VLANs to implement network isolation. However, the
deployment of VLANs on large-scale virtualized networks has the following limitations:
− The VLAN tag field defined in IEEE 802.1Q has only 12 bits and can support only
a maximum of 4094 VLANs, which cannot meet user identification requirements of
large Layer 2 networks.
− VLANs on legacy Layer 2 networks cannot adapt to dynamic network adjustment.
VM migration scope is limited by the network architecture.
After a VM is started, it may need to be migrated to a new server due to resource issues
on the original server, for example, when the CPU usage is too high or memory
resources are inadequate. To ensure uninterrupted services during VM migration, the IP
and MAC addresses of the VM must remain unchanged. To carry this out, the service
network must be a Layer 2 network and also provide multipathing redundancy backup
and reliability.
VXLAN addresses the preceding problems on large Layer 2 networks.
Eliminates VM scale limitations imposed by network specifications.
VXLAN encapsulates data packets sent from VMs into UDP packets and encapsulates IP
and MAC addresses used on the physical network into the outer headers. Then the
network is only aware of the encapsulated parameters and not the inner data. This greatly
reduces the MAC address specification requirements of large Layer 2 networks.
Provides greater network isolation capabilities.
VXLAN uses a 24-bit network segment ID, called VXLAN network identifier (VNI), to
identify users. This VNI is similar to a VLAN ID and supports a maximum of 16M [(2^24
- 1)/1024^2] VXLAN segments.
Eliminates VM migration scope limitations imposed by network architecture.
VXLAN uses MAC-in-UDP encapsulation to extend Layer 2 networks. It encapsulates
Ethernet packets into IP packets for these Ethernet packets to be transmitted over routes,
and does not need to be aware of VMs' MAC addresses. There is no limitation on Layer
3 network architecture, and therefore Layer 3 networks are scalable and have strong
automatic fault rectification and load balancing capabilities. This allows for VM
migration irrespective of the network architecture.
Benefits
As server virtualization is being rapidly deployed on data centers based on physical network
infrastructure, VXLAN offers the following benefits:
A maximum of 16M VXLAN segments are supported using 24-bit VNIs, which allows a
data center to accommodate multiple tenants.
Non-VXLAN network edge devices do not need to identify the VM's MAC address,
which reduces the number of MAC addresses that have to be learned and enhances
network performance.
MAC-in-UDP encapsulation extends Layer 2 networks, decoupling between physical
and virtual networks. Tenants are able to plan their own virtual networks, not limited by
the physical network IP addresses or broadcast domains. This greatly simplifies network
management.
1.7.11.2 Principles
1.7.11.2.1 Basic Concepts
Virtual extensible local area network (VXLAN) is an NVO3 network virtualization
technology that encapsulates data packets sent from virtual machines (VMs) into UDP packets
and encapsulates IP and MAC addresses used on the physical network in outer headers before
sending the packets over an IP network. The egress tunnel endpoint then decapsulates the
packets and sends the packets to the destination VM.
VXLAN allows a virtual network to provide access services to a large number of tenants. In
addition, tenants are able to plan their own virtual networks, not limited by the physical
network IP addresses or broadcast domains. This greatly simplifies network management.
Table 1-129 describes VXLAN concepts.
Concept Description
Underlay and VXLAN allows virtual Layer 2 or Layer 3 networks (overlay networks)
overlay to be built over existing physical networks (underlay networks).
networks Overlay networks use encapsulation technologies to transmit tenant
packets between sites over Layer 3 forwarding paths provided by
underlay networks. Tenants are aware of only overlay networks.
Network A network entity that is deployed at the network edge and implements
virtualization network virtualization functions.
Concept Description
edge (NVE) NOTE
vSwitches on devices and servers can function as NVEs.
VXLAN tunnel A VXLAN tunnel endpoint that encapsulates and decapsulates VXLAN
endpoint packets. It is represented by an NVE.
(VTEP) A VTEP connects to a physical network and is assigned a physical
network IP address. This IP address is irrelevant to virtual networks.
In VXLAN packets, the source IP address is the local node's VTEP
address, and the destination IP address is the remote node's VTEP
address. This pair of VTEP addresses corresponds to a VXLAN tunnel.
VXLAN A VXLAN segment identifier similar to a VLAN ID. VMs on different
network VXLAN segments cannot communicate directly at Layer 2.
identifier (VNI) A VNI identifies only one tenant. Even if multiple terminal users
belong to the same VNI, they are considered one tenant. A VNI
consists of 24 bits and supports a maximum of 16M (2^24-1) tenants.
Bridge domain A Layer 2 broadcast domain through which VXLAN data packets are
(BD) forwarded.
VNIs identifying VNs must be mapped to BDs so that a BD can
function as a VXLAN network entity to transmit VXLAN traffic.
VBDIF interface A Layer 3 logical interface created for a BD. Configuring IP addresses
for VBDIF interfaces allows communication between VXLANs on
different network segments and between VXLANs and non-VXLANs
and implements Layer 2 network access to a Layer 3 network.
Virtual access A Layer 2 sub-interface used to transmit data packets.
point (VAP) Layer 2 sub-interfaces can have different encapsulation types
configured to transmit various types of data packets.
Gateway A device that ensures communication between VXLANs identified by
different VNIs and between VXLANs and non-VXLANs.
A VXLAN gateway can be a Layer 2 or Layer 3 gateway.
Layer 2 gateway: allows tenants to access VXLANs and
intra-segment communication on a VXLAN.
Layer 3 gateway: allows inter-segment VXLAN communication and
access to external networks.
Field Description
VXLAN header VXLAN Flags (8 bits): The value is 00001000.
VNI (24 bits): VXLAN Segment ID or VXLAN Network
Identifier used to identify a VXLAN segment.
Reserved fields (24 bits and 8 bits): must be set to 0.
Outer UDP header DestPort: destination port number, which is 4789 for UDP.
Source Port: source port number, which is calculated by
performing the hash operation on the inner Ethernet frame
headers.
Outer Ethernet header MAC DA: destination MAC address, which is the MAC
address mapped to the next-hop IP address based on the
Field Description
destination VTEP address in the routing table of the VTEP
on which the VM that sends packets resides.
MAC SA: source MAC address, which is the MAC address
of the VTEP on which the VM that sends packet resides.
802.1Q Tag: VLAN tag carried in packets. This field is
optional.
Ethernet Type: Ethernet packet type.
Software mode: On the network shown in Figure 1-476, all NVEs are deployed on
vSwitches, which perform VXLAN encapsulation and decapsulation.
Hybrid mode: On the network shown in Figure 1-477, some NVEs are deployed on
vSwitches, and others on NVE-capable devices. Both vSwitches and NVE-capable
devices may perform VXLAN encapsulation and decapsulation.
c. Upon receipt of the EVPN routes, a gateway matches the export VPN target carried
in the route against the import VPN target of its local EVPN instance. If the two
VPN targets match, the gateway accepts the route and stores the VTEP IP address
and VNI carried in the route for later packet transmission over the VXLAN tunnel.
If the two VPN targets do not match, the gateway drops the route.
In this example, the import VPN target of one EVPN instance must match the export VPN target of the
other EVPN instance. Otherwise, the VXLAN tunnel cannot be established. If only one end can
successfully accept the IRB or IP prefix route, this end can establish a VXLAN tunnel to the other end,
but cannot exchange data packets with the other end. The other end drops packets after confirming that
there is no VXLAN tunnel to the end that has sent these packets.
Figure 1-479 VXLAN tunnel establishment using EVPN in centralized gateway scenarios
The routing protocol running on each leaf node can be either Multiprotocol Extensions for
BGP (MP-BGP) or BGP Ethernet Virtual Private Network (EVPN).
BGP EVPN
BGP EVPN defines a new type of BGP network layer reachability information (NLRI), called
EVPN NLRI. In a distributed VXLAN gateway scenario, EVPN serves as the VXLAN
control plane. It uses MAC advertisement route prefixes and IP prefix route prefixes carried in
EVPN NLRI as well as extended community attributes to transmit information required for
VXLAN tunnel establishment. Figure 1-482 illustrates the formats of a MAC advertisement
route prefix, an IP prefix route prefix, and an extended community attribute. The MAC
advertisement route prefix, IP prefix route prefix, and extended community attribute can form
different types of routes. Table 1-131 compares the two types of routes used in a distributed
VXLAN gateway scenario.
A host route or host network segment route is stored in the IP Address and IP Address Length
fields of a MAC advertisement route prefix or IP prefix route prefix.
Figure 1-482 Formats of the MAC advertisement route, IP prefix route, and extended community
attribute
Figure 1-483 illustrates the process of automatically establishing a VXLAN tunnel between
two distributed VXLAN gateways.
1. Create an EVPN instance and a VPN instance on each VXLAN gateway (Device 1 and
Device 2 in this example) and establish a BGP EVPN peer relationship between the two
devices.
2. Device 1 and Device 2 use BGP EVPN to exchange IRB or IP prefix routes.
− Upon receipt of an ARP request from a terminal, a gateway obtains the host's ARP
entry and generates a MAC advertisement route from this entry. Then, the gateway
advertises this route as an IRB route to the other gateway.
− A gateway imports the routes destined for the host address or host network segment
address to its local VPN instance. Then, the gateway imports these routes from the
local VPN instance to the local EVPN instance and advertises these routes to the
other gateway using an IP prefix route.
3. Upon receipt of the IRB or IP prefix route, a gateway matches the export VPN target
carried in the route against the import VPN target of its local EVPN instance. If the two
VPN targets match, the gateway accepts the route and stores the VTEP IP address and
VNI carried in the route for later packet transmission over the VXLAN tunnel. If the two
VPN targets do not match, Device 2 drops the route.
In this example, the import VPN target of one EVPN instance must match the export VPN target of the
other EVPN instance. Otherwise, the VXLAN tunnel cannot be established. If only one end can
successfully accept the IRB or IP prefix route, this end can establish a VXLAN tunnel to the other end,
but cannot exchange data packets with the other end. The other end drops packets after confirming that
there is no VXLAN tunnel to the end that has sent these packets.
Figure 1-483 Establishing a VXLAN tunnel between two distributed VXLAN gateways using the
EVPN control plane
1. When VM1 communicates with VM2 for the first time, VM1 sends an ARP request with
the destination MAC address being all Fs for Device 3's MAC address.
2. After the ARP request arrives at Device 1, Device 1 updates the locally saved MAC
address table and broadcasts the ARP request onto the local network segment.
3. After the ARP request arrives at Device 3, Device 3 updates the locally saved ARP table
and responds to Device 1 with an ARP reply with the source MAC address being MAC3.
4. Upon receipt, Device 1 updates the locally saved MAC address table.
5. After VM1 receives the ARP reply, VM1 updates the locally saved ARP table and sends
data packets to the Layer 3 gateway NVE3.
6. After the data packets arrive at Device 1, Device 1 searches the locally saved MAC
address table and finds that the outbound interface points to NVE3. Device 1
encapsulates the data packets into VXLAN packets as shown in Figure 1-485 and sends
the VXLAN packets to Device 3.
7. Upon receipt, Device 3 decapsulates the VXLAN packets to obtain and route the data
packets.
8. Upon receipt, NVE3 searches the locally saved ARP table for an entry containing the
mapping between VM2's IP and MAC addresses but fails. Therefore, NVE3 sends an
ARP request to NVE2.
9. After the ARP request arrives at Device 2, Device 2 updates the locally saved MAC
address table and broadcasts the ARP request onto the local network segment.
10. After the ARP request arrives at VM2, VM2 updates the locally saved ARP table and
responds to Device 2 with an ARP reply.
11. Upon receipt, Device 2 updates the locally saved MAC address table and responds to
Device 3 with an ARP reply.
12. Upon receipt, Device 3 updates the locally saved ARP table.
13. Before NVE3 sends data packets to VM2, NVE3 searches the locally saved MAC
address table and finds that the outbound interface points to NVE2. Therefore, NVE3
encapsulates the data packets into VXLAN packets as shown in Figure 1-486.
14. NVE3 then forwards the VXLAN packets to Device 2 based on the outer routing table
shown in Figure 1-486. Upon receipt, Device 2 decapsulates the VXLAN packets,
searches the MAC address table, and forwards the data packets to VM2.
The VM2 -> VM1 communication process is similar to the VM1 -> VM2 communication
process. In VM1 -> VM2 communication, the Layer 3 gateway, NVE1, and NVE2 have
updated ARP and MAC address tables, and therefore ARP request transmission is not needed
during VM2 -> VM1 communication. VM2 communicates with VM1 in unicast mode.
VM1 and VM2 on different network segments can now communicate through VXLAN Layer
3 gateways.
1. When VM1 communicates with VM2 for the first time, VM1 sends an ARP request with
the destination MAC address being all Fs.
2. After the ARP request arrives at Device 1, Device 1 updates the locally saved MAC
address table (adds a VNI ID to the MAC address table).
3. Device 1 broadcasts the ARP request onto the local network segment, generates an
EVPN route that carries the MAC advertisement route prefix based on the ARP request,
and advertises the route to its BGP EVPN peer, Device 3.
4. Upon receipt, Device 3 updates the locally saved MAC address table.
5. After the ARP request arrives at Device 3, Device 3 updates the locally saved ARP table
and responds to Device 1 with an ARP reply with the source MAC address being MAC3.
6. Device 3 generates an EVPN route that carries the MAC advertisement route prefix
based on the ARP reply and advertises the route to its BGP EVPN peer, Device 1.
7. Upon receipt, Device 1 updates the locally saved MAC address table.
Device 2 obtains Device 3's MAC address entry in the same way as Device 1.
Ingress replication: After an NVE receives broadcast, unknown unicast, and multicast (BUM) packets,
the local VTEP obtains a list of VTEPs on the same VXLAN segment as itself through the control plane
and sends a copy of the BUM packets to every VTEP in the list.
Ingress replication allows BUM packets to be transmitted in broadcast mode, independent of multicast
routing protocols.
a. After Device 1 receives packets from Terminal A, Device 1 determines the Layer 2
broadcast domain of the packets based on the access interface and VLAN
information carried in the packets and checks whether the destination MAC address
is a BUM address.
If the destination MAC address is a BUM address, Device 1 broadcasts the
packets in the Layer 2 broadcast domain and goes to 2.
If the destination MAC address is not a BUM address, Device 1 follows the
unicast packet forwarding process.
b. Device 1's VTEP obtains the ingress replication list for the VNI, replicates packets
based on the list, and performs VXLAN tunnel encapsulation by adding outer
headers. Device 1 then forwards the packets through the outbound interface.
c. Upon receipt of the VXLAN packets, the VTEP on Device 2 or Device 3 verifies
the VXLAN packets based on the UDP destination port numbers, source and
destination IP addresses, and VNI. The VTEP obtains the Layer 2 broadcast domain
based on the VNI and removes the outer headers to obtain the inner Layer 2 packets.
It then determines whether the destination MAC address is a BUM address.
If the destination MAC address is a BUM address, the VTEP broadcasts the
packets to the user side in the Layer 2 broadcast domain.
If the destination MAC address is not a BUM address, the VTEP further
checks whether it is a local MAC address.
If it is a local MAC address, the VTEP sends the packets to the device.
If it is not a local MAC address, the VTEP searches for the outbound
interface and encapsulation information in the Layer 2 broadcast domain
and goes to 4.
d. Device 2 or Device 3 adds VLAN tags to the packets based on the outbound
interface and encapsulation information and forwards the packets to Terminal B or
Terminal C.
Terminal B or Terminal C responds to Terminal A following the unicast packet forwarding process.
a. After Device 1 receives packets from Terminal A, Device 1 determines the Layer 2
broadcast domain of the packets based on the access interface and VLAN
information carried in the packets and checks whether the destination MAC address
is a unicast address.
In a centralized VXLAN gateway scenario shown in Figure 1-489, the inter-segment packet
forwarding process is as follows:
1. After Device 3 receives VXLAN packets, it decapsulates the packets and checks whether
the destination MAC address in the inner packets is the MAC address of the Layer 3
gateway interface BDIF10.
− If the destination MAC address is a local MAC address, Device 3 forwards the
packets to the Layer 3 gateway on the destination network segment and goes to 2.
− If the destination MAC address is not a local MAC address, Device 3 searches for
the outbound interface and encapsulation information in the Layer 2 broadcast
domain.
2. Device 3 remove the Ethernet headers of the inner packets and parse the destination IP
address. Device 3 searches the routing table for the next-hop IP address base on the
destination and the ARP entries based on the next-hop IP address. Device 3 uses the
ARP entries to identify the destination MAC address, VXLAN tunnel's outbound
interface, and VNI.
− If the VXLAN tunnel's outbound interface and VNI cannot be found, Device 3
performs Layer 3 forwarding.
− If the VXLAN tunnel's outbound interface and VNI can be found, Device 3 follows
3.
3. Device 3 encapsulate VXLAN packets again, with the source MAC address in the
Ethernet header of the inner packets as the MAC address of the Layer 3 gateway
interface BDIF20.
For details on communication between Device 3 and other devices, see Layer 2 gateway principles.
Basic Concepts
In the scenario where an enterprise site and data center are interconnected, the VPN GWs
(PE1 and PE2) and the enterprise Site (CPE) are connected through VXLAN tunnels to
exchange L2/L3 services between the enterprise site (CPE) and data center. The data center
gateway (CE1) is dual-homed to PE1 and PE2 to access the VXLAN network, which
enhances network access reliability. When one PE fails, services can be rapidly switched to
the other PE, minimizing the impact on services.
As shown in Figure 1-490, PE1 and PE2 use a virtual address as an NVE interface address at
the network side, namely, the Anycast VTEP address. In this way, the CPE is aware of only
one remote NVE interface and establishes a VXLAN tunnel with the virtual address. The
packets from the CPE can reach CE1 through either PE1 or PE2. However, single-homed CEs
may exist, such as CE2 and CE3. As a result, after reaching a PE, the packets from the CPE
may need to be forwarded by the other PE to a single-homed CE. Therefore, a bypass
VXLAN tunnel needs to be established between PE1 and PE2. An EVPN peer relationship is
established between PE1 and PE2. Different addresses, namely, bypass VTEP addresses, are
configured for PE1 and PE2 so that they can establish a bypass VXLAN tunnel.
Control Plane
PE2 sends a multicast route to PE1. The source address of the route is the Anycast VTEP
address shared by PE1 and PE2. The route carries the bypass VXLAN extended
community attribute, including the bypass VTEP address of PE1.
After receiving the multicast route from PE2, PE1 considers that an Anycast relationship
be established with PE2. This is because the source address (Anycast VTEP address) of
the route is the same as the local virtual address of PE1 and the route carries the bypass
VTEP extended community attribute. Based on the bypass VXLAN extended attribute of
the route, PE1 establishes a bypass VXLAN tunnel to PE2.
PE1 learns the MAC address of the CEs through upstream packets at the AC side and
advertises the routes to PE2 through BGP EVPN. The routes carry the ESI of the links
accessed by the CEs, and information about the VLANs that the CE access, and bypass
VXLAN extended community attribute.
PE1 learns the MAC address of the CPE through downstream packets at the network side,
specifies that the next-hop address of the MAC route can be iterated to a static VXLAN
tunnel, and advertises the route to PE2. The next-hop address of the MAC route cannot
be changed.
− Downlink
As shown in Figure 1-492:
After receiving a Layer 2 unicast packet sent by the CPE to CE1, PE1 performs
VXLAN decapsulation on the packet, searches the local MAC address table for the
destination MAC address, obtains the outbound interface, and forwards the packet
to CE1.
After receiving a Layer 2 unicast packet sent by the CPE to CE2, PE1 performs
VXLAN decapsulation on the packet, searches the local MAC address table for the
destination MAC address, obtains the outbound interface, and forwards the packet
to CE2.
After receiving a Layer 2 unicast packet sent by the CPE to CE3, PE1 performs
VXLAN decapsulation on the packet, searches the local MAC address table for the
destination MAC address, and forwards it to PE2 over the bypass VXLAN tunnel.
After the packet reaches PE2, PE2 searches the destination MAC address, obtains
the outbound interface, and forwards the packet to CE3.
The process for PE2 to forward packets from the CPE is the same as that for PE1 to
forward packets from the CPE.
− As shown in Figure 1-494, after a BUM packet from CE2 reaches PE1, PE1 sends a
copy of the packet to CE1 and the CPE. In addition, PE1 sends a copy of the packet
to PE2 through the bypass VXLAN tunnel between PE1 and PE2. After the copy
of the packet reaches PE2, PE2 sends it to CE3, not to the CPE or CE1.
− As shown in Figure 1-495, after a BUM packet from CE1 reaches PE1, PE1 sends a
copy of the packet to CE2 and the CPE. In addition, PE1 sends a copy of the packet
to PE2 through the bypass VXLAN tunnel between PE1 and PE2. After the copy
of the packet reaches PE2, PE2 sends it to CE3, not to the CPE or CE1.
− Uplink
As shown in Figure 1-491:
Because the CPE is on a different network segment from PE1 and PE2, the
destination MAC address of a Layer 3 unicast packet sent from CE1, CE2, or CE3
to the CPE is the MAC address of the BDIF interface on the Layer 3 gateway of
PE1 or PE2. After receiving the packet, PE1 or PE2 removes the Layer 2 tag from
the packet, searches for a matching Layer 3 routing entry, and obtains the outbound
interface that is the BDIF interface connecting the CPE to the Layer 3 gateway. The
BDIF interface searches the ARP table, obtains the destination MAC address,
encapsulates the packet into a VXLAN packet, and sends it to the CPE through the
VXLAN tunnel.
After receiving the Layer 3 packet from PE1 or PE2, the CPE removes the Layer 2
tag from the packet because the destination MAC address is the MAC address of
the BDIF interface on the CPE. Then the CPE searches the Layer 3 routing table to
obtain a next-hop address to forward the packet.
− Downlink
As shown in Figure 1-492:
Before sending a Layer 3 unicast packet to CE1 across subnets, the CPE searches its
Layer 3 routing table and obtains the outbound interface that is the BDIF interface
on the Layer 3 gateway connecting to PE1. The BDIF interface searches the ARP
table to obtain the destination MAC address, encapsulates the packet into a VXLAN
packet, and forwards it to PE1 over the VXLAN tunnel.
After receiving the packet from the CPE, PE1 removes the Layer 2 tag from the
packet because the destination address of the packet is the MAC address of PE1's
BDIF interface. Then PE1 searches the Layer 3 routing table and obtains the
outbound interface that is the BDIF interface connecting PE1 to its attached CE.
The BDIF interface searches its ARP table and obtains the destination address,
performs Layer-2 encapsulation for the packet, and sends it to CE1.
The process for PE2 to forward packets from the CPE is the same as that for PE1 to
forward packets from the CPE.
1.7.11.3 Applications
1.7.11.3.1 Application for Communication Between Terminal Users on a VXLAN
Service Description
Currently, data centers are expanding on a large scale for enterprises and carriers, with
increasing deployment of virtualization and cloud computing. In addition, to accommodate
more services while reducing maintenance costs, data centers are employing large Layer 2 and
virtualization technologies.
As server virtualization is implemented in the physical network infrastructure for data centers,
VXLAN, an NVO3 technology, has adapted to the trend by providing virtualization solutions
for data centers.
Networking Description
On the network shown in Figure 1-496, an enterprise has VMs deployed in different data
centers. Different network segments run different services. The VMs running the same service
or different services in different data centers need to communicate with each other. For
example, VMs of the financial department residing on the same network segment need to
communicate, and VMs of the financial and engineering departments residing on different
network segments also need to communicate.
Feature Deployment
As shown in Figure 1-496:
Deploy Device 1 and Device 2 as Layer 2 VXLAN gateways and establish a VXLAN
tunnel between Device 1 and Device 2 to allow communication between terminal users
on the same network segment.
Deploy Device 3 as a Layer 3 VXLAN gateway and establish a VXLAN tunnel between
Device 1 and Device 3 and between Device 2 and Device 3 to allow communication
between terminal users on different network segments.
Configure VXLAN on devices to trigger VXLAN tunnel establishment and dynamic learning
of ARP and MAC address entries. By now, terminal users on the same network segment and
different network segments can communicate through the Layer 2 and Layer 3 VXLAN
gateways based on ARP and routing entries.
Service Description
Currently, data centers are expanding on a large scale for enterprises and carriers, with
increasing deployment of virtualization and cloud computing. In addition, to accommodate
more services while reducing maintenance costs, data centers are employing large Layer 2 and
virtualization technologies.
As server virtualization is implemented in the physical network infrastructure for data centers,
VXLAN, an NVO3 technology, has adapted to the trend by providing virtualization solutions
for data centers, allowing intra-VXLAN communication and communication between
VXLANs and legacy networks.
Networking Description
On the network shown in Figure 1-497, an enterprise has VMs deployed for the finance and
engineering departments and a legacy network for the human resource department. The
finance and engineering departments need to communicate with the human resource
department.
Figure 1-497 Communication between terminal users on a VXLAN and legacy network
Feature Deployment
As shown in Figure 1-497:
Deploy Device 1 and Device 2 as Layer 2 VXLAN gateways and Device 3 as a Layer 3
VXLAN gateway. The VXLAN gateways are VXLANs' edge devices connecting to legacy
networks and are responsible for VXLAN encapsulation and decapsulation. Establish a
VXLAN tunnel between Device 1 and Device 3 and between Device 2 and Device 3 for
VXLAN packet transmission.
When the human resource department sends a packet to VM1 of the financial department, the
process is as follows:
1. Device 1 receives the packet and encapsulates it into a VXLAN packet before sending it
to Device 3.
2. Upon receipt, Device 3 decapsulates the VXLAN packet and removes the Ethernet
header in the inner packet, parses the destination IP address, and searches the routing
table for a next hop address. Then, Device 3 searches the ARP table based on the next
hop address to determine the destination MAC address, VXLAN tunnel's outbound
interface, and VNI.
3. Device 3 encapsulates the VXLAN tunnel's outbound interface and VNI into the packet
and sends the VXLAN packet to Device 2.
4. Upon receipt, Device 2 decapsulates the VXLAN packet, finds the outbound interface
based on the destination MAC address, and forwards the packet to VM1.
Service Description
In legacy networking, a centralized Layer 3 gateway is deployed on an aggregation or spine
node. Packets across different networks must be forwarded through the centralized Layer 3
gateway, resulting in the following problems:
Forwarding paths are not optimal. Layer 3 traffic of data centers in different locations
must be transmitted to the centralized Layer 3 gateway for forwarding.
The ARP entry specification is a bottleneck. ARP entries must be generated for tenants
on the centralized Layer 3 gateway. However, the centralized Layer 3 gateway can only
have a limited number of ARP entries configured, which does not facilitate data center
network expansion.
Distributed VXLAN gateways can be configured to address these problems. In distributed
VXLAN gateway networking, leaf nodes, which can function as Layer 3 VXLAN gateways,
are used as VTEPs to establish VXLAN tunnels. Spine nodes are unaware of the VXLAN
tunnels and only forward VXLAN packets.
Networking Description
On the network shown in Figure 1-498, Server1 and Server2 on different network segments
both connect to Leaf1. Configure Leaf1 as a Layer 3 VXLAN gateway. When Server1 and
Server2 communicate, traffic is forwarded through only Leaf1, not any spine node.
Server1 and Server3 on different network segments connect to Leaf1 and Leaf2, respectively.
Configure both Leaf1 and Leaf2 as Layer 3 VXLAN gateways. When Server1 and Server3
communicate, traffic is forwarded through the VXLAN tunnel established between Leaf1 and
Leaf2. Spine nodes are unaware of the VXLAN tunnel and only forward VXLAN packets.
Feature Deployment
Deploy both Layer 2 and Layer 3 VXLAN gateways on leaf nodes.
Layer 2 gateway: allows tenant access to VXLANs and intra-subnet VXLAN
communication on the same network segment.
Layer 3 gateway: allows inter-subnet VXLAN communication and access to external
networks.
Note the following when deploying distributed VXLAN gateways:
When ARP broadcast suppression is enabled on a leaf node functioning as a Layer 2
VXLAN gateway, the leaf node can determine whether to broadcast ARP request
messages sent from tenants or servers. This function suppresses ARP broadcast traffic,
which improves network performance.
When advertisement of host routes generated based on ARP entries is enabled on a leaf
node functioning as a Layer 3 VXLAN gateway, the leaf node learns ARP entries of
tenants, generates host routes based on the ARP entries, and uses BGP to advertise the
host routes to BGP peers.
When traffic is transmitted across leaf nodes at Layer 3, tenants must be bound to VPN
instances. VXLAN tunnels can then be established through BGP VPN peers..
On the network shown in Figure 1-499, the device deployed at the network edge establishes a
VXLAN tunnel with a virtual BRAS for user access.
1. After a user terminal starts or an IPoE, PPPoE, or L2TP user dials up, the terminal sends
an access message, which is relayed to the edge device through an optical line terminal
(OLT).
2. The edge device encapsulates the access message with a VXLAN header to form a
VXLAN packet and transparently transmits it to the BRAS through a VXLAN tunnel.
3. The BRAS removes the VXLAN header of the received VXLAN packet and processes
the access message.
On the network shown in Figure 1-500, a VPLS network and a VXLAN network intersect at
Device 2 and Device 4 for BRAS access. The user access implementation is as follows:
1. After a user terminal starts or an IPoE, PPPoE, or L2TP user dials up, the terminal sends
an access message, which is relayed to edge devices (Device 1 and Device 3) through
OLTs.
2. Device 1 and Device 3 create a VSI for each OLT so that each OLT is identified by a
VSI. Device 1 through Device 4 internetwork using VPLS.
3. Device 2 and Device 4 back up each other, with Device 2 the master and Device 4 the
backup. VSIs are mapped to VXLAN VNIs in 1:1 mode. Device 2 and Device 4 have the
same VTEP IP address configured to exchange packets between the PW and VXLAN
tunnel.
4. Device 2 and Device 4 have a VRRP backup group configured to implement link
protection in case the link between Device 2 and BRAS 1 fails.
5. VRRP is associated with the virtual VTEP's route priority on the Device 2 and Device 4
interfaces connecting to the BRASs, and the route priority of the virtual VTEP on the
master device is higher than that of the virtual VTEP on the backup device. Downstream
VXLAN traffic of the BRASs is transmitted through the master device. After
downstream traffic is transmitted to Device 1 and Device 3, their MAC address entries
are updated for guiding upstream traffic to the master device.
6. VRRP is associated with PWs on the Device 2 and Device 4 interfaces connecting to the
BRASs so that the PW interface of the backup device does not receive or forward
VXLAN traffic. User access packets are broadcast to both Device 2 and Device 4 in the
VSI. Because PW packets are blocked on the backup device, only the master device
forwards the user access packets.
7. VRRP is associated with the Device 2 and Device 4 interfaces connecting to the VPLS
network. If link S on the VPLS network fails, protection switching is performed.
8. Device 2 and Device 4 establish VXLAN tunnels with the BRASs for VXLAN packet
encapsulation and decapsulation, implementing BRAS access.
Service Description
In the scenario where an enterprise site and data center are interconnected, the data center
gateway (CE1) is dual-homed to PE1 and PE2 to access the VXLAN network to interconnect
with the remote enterprise site (CPE), which brings the following benefits:
Benefits to carriers: The network access reliability is enhanced because the CE is
dual-homed to PEs.
Benefits to users: Users are not aware of service failures because services can quickly
recover from a failure.
Networking Description
As shown in Figure 1-501, PE1 and PE2 use a virtual address as an NVE interface address at
the network side, namely, the Anycast VTEP address. In this way, the CPE is aware of only
one remote NVE interface and establishes a VXLAN tunnel with the virtual address. The
packets from the CPE can reach CE1 through either PE1 or PE2. An EVPN peer relationship
is established between PE1 and PE2. Different addresses, namely, bypass VTEP addresses, are
configured for PE1 and PE2 so that they can establish a bypass VXLAN tunnel.
Feature Deployment
As shown in Figure 1-501:
At the AC side, a CE is dual-homed to PEs.
Terms
Term Description
NVO3 Network Virtualization over L3. A network virtualization technology
implemented at Layer 3 for traffic isolation and IP independence
between multi-tenants of data centers so independent Layer 2 subnets
can be provided for tenants. In addition, NVO3 supports VM deployment
and migration on Layer 2 subnets of tenants.
VXLAN Virtual extensible local area network. An NVO3 network virtualization
technology that encapsulates data packets sent from VMs into UDP
packets and encapsulates IP and MAC addresses used on the physical
network in the outer headers before sending the packets over an IP
network. The egress tunnel endpoint then decapsulates the packets and
sends the packets to the destination VM.
BD bridge domain
BUM broadcast, unknown unicast, and multicast
VNI VXLAN Network Identifier
VTEP VXLAN Tunnel Endpoints
Definition
Data Center Interconnection (DCI) provides solutions to interconnect data centers.
Using Virtual extensible local area network (VXLAN), Ethernet virtual private network
(EVPN), and BGP/MPLS IP VPN technologies, DCI solutions allow packets that are
exchanged between data centers to be transmitted securely and reliably over carrier networks,
allowing VMs in different data centers to communicate with each other.
Purpose
To meet the requirements of cross-region operation, user access, and inter-city disaster
recovery that arise during enterprise development, an increasing number of enterprises have
deployed data centers in multiple regions and across carrier networks. Currently, leased fibers
or leased lines are commonly used to interconnect cross-region data centers, which has the
following disadvantages:
For enterprises, leased fibers or leased lines are costly.
For carriers, service exploration is difficult, and resource utilization is low.
To cope with these disadvantages, a DCI network that is characterized by high security and
reliability and flexible scheduling needs to be constructed and operated. DCI solutions can be
deployed on the carrier network to allow packets to be transmitted securely and reliably
between data centers and maximize resource utilization.
Benefits
DCI solutions provide the following benefits:
A data center interconnection network that is characterized by high security and
reliability and flexible scheduling for cross-region data center operation
Tenant-based differentiated services, which help implement flexible resource scheduling
and reduce costs
1.7.12.2 Principles
1.7.12.2.1 Control Plane
DCI solutions are responsible for Layer 3 Route Advertisement and Layer 2 Route
Advertisement on the control plane. DCI-related concepts are described as follows.
Concept Description
Overlay network An overlay network is a logical network
deployed over a physical network and can be
regarded as a network connected through
virtual or logical links.
An overlay network has its own control plane
Concept Description
and forwarding plane.
An overlay network is a step forward for a
physical network towards cloud and
virtualization. An overlay network is critical
for cloud network convergence because it
frees cloud resource pool capabilities from
various restrictions of the physical network.
Table 1-133 Routes on the data center and carrier network sides
In DCI solutions, a carrier network can carry Layer 3 traffic, in both integrated and separated
deployment scenarios. In the two scenarios, Layer 3 route advertisement processes on the DCI
backbone network are the same. This section describes the Layer 3 route advertisement
process only in the integrated deployment scenario. Figure 1-502 shows the networking of the
integrated deployment scenario.
Figure 1-503 illustrates the Layer 3 route advertisement process. The detailed process is
described as follows:
1. After receiving a VM host route from Device 1, DCI-PE1-GW1 parses the route,
regardless of whether it is an IRB or IP prefix route.
2. Based on the RT of the VM host route, DCI-PE1-GW1 crosses the VPNv4 route to a
local VPN instance.
3. DCI-PE1-GW1 changes the next hop of the EVPN route to the IP address used to
establish the VPNv4 peer relationship, performs re-encapsulation, and replaces the RD
and RT of the EVPN route with the RD and RT of the L3VPN instance, respectively. In
addition, DCI-PE1-GW1 applies for an MPLS label and sends the VPNv4 route to
DCI-PE2-GW2.
4. Based on the RT of the VPNv4 route, DCI-PE2-GW2 crosses the VPNv4 route to a local
VPN instance.
5. DCI-PE2-GW2 changes the next hop of the VPNv4 route to the local VTEP address,
performs re-encapsulation, replaces the RD and RT of the VPNv4 route with the RD and
RT of the L3VPN instance, respectively. In addition, DCI-PE2-GW2 adds the L3VNI
and sends the EVPN route to Device 2.
Table 1-134 Routes on the data center and carrier network sides
In DCI solutions, a carrier network can carry Layer 2 traffic only in the integrated deployment
scenario.
Figure 1-504 illustrates the Layer 2 route advertisement process. The detailed process is
described as follows:
1. After receiving a VM host MAC route from Device 1, DCI-PE1-GW1 parses and learns
the route.
2. Based on the RT of the VM host MAC route, DCI-PE1-GW1 crosses the EVPN route to
a local EVPN instance.
3. DCI-PE1-GW1 changes the next hop of the EVPN route to the IP address used to
establish the EVPN peer relationship, performs re-encapsulation, and replaces the RD
and RT of the VXLAN-encapsulated EVPN route with the RD and RT of the EVPN
instance, respectively. In addition, DCI-PE1-GW1 applies for an MPLS label and sends
the EVPN route to DCI-PE2-GW2.
4. Based on the RT of the EVPN route, DCI-PE2-GW2 crosses the EVPN route to a local
EVPN instance.
5. DCI-PE2-GW2 changes the next hop of the EVPN route to the local VTEP IP address,
performs re-encapsulation, and replaces the RD and RT of the EVPN route with the RD
and RT of the EVPN instance, respectively. In addition, DCI-PE2-GW2 adds the L2VNI
and sends the EVPN route to Device 2.
On the network shown in Figure 1-505, Layer 3 traffic forwarding on the data plane is
described as follows:
1. After receiving a VXLAN packet carrying a VM host route from Device 1 in data center
A, DCI-PE1-GW1 parses the packet and obtains the corresponding VPN instance
according to VNI carried in the packet. In addition, DCI-PE1-GW1 searches the VPN
instance for the outbound interface and encapsulation information based on the prefix of
the VM host route's destination IP address. Because the outbound interface is an MPLS
tunnel interface, DCI-PE1-GW1 encapsulates the inner Layer 3 packet using MPLS and
sends the MPLS packet through the MPLS tunnel over the backbone network.
2. After DCI-PE2-GW2 receives double-tagged MPLS packet, it parses the packet using
MPLS, removes the outer MPLS public network label, and obtains the corresponding
VPN instance based on the VPN label. Then, DCI-PE2-GW2 searches the VPN
forwarding table based on the prefix of the VM host route's destination IP address.
Because the next hop is a VXLAN tunnel interface and the VTEP of the VXLAN tunnel
is Device 2 in data center B. DCI-PE2-GW2 encapsulates the original data packages and
attributes such as L3VNI and Router-MAC into a VXLAN packet and sends it to Device
2.
On the network shown in Figure 1-505, Layer 2 traffic forwarding on the data plane is
described as follows:
1. After receiving a VXLAN packet carrying a VM MAC route from Device 1 in data
center A, DCI-PE1-GW1 parses the packet and obtains the corresponding Layer 2
broadcast domain according to the VNI carried in the packet. In addition, DCI-PE1-GW1
searches the Layer 2 broadcast domain for the outbound interface and encapsulation
information based on the destination MAC address of the VM host. Because the
outbound interface is an MPLS tunnel interface, DCI-PE1-GW1 encapsulates the inner
Layer 2 packet using MPLS and sends the MPLS packet through the MPLS tunnel over
the backbone network.
2. After DCI-PE2-GW2 receives the MPLS packet, it parses the packet using MPLS,
removes the outer MPLS public network label, and obtains the Layer 2 broadcast domain
based on the EVPN label and BD ID. Then, DCI-PE2-GW2 searches the Layer 2
broadcast domain based on the destination MAC address of the VM host. Because the
outbound interface is a VXLAN tunnel interface and the VTEP of the VXLAN tunnel is
Device 2 in data center B, DCI-PE2-GW performs VXLAN encapsulation based on the
VXLAN tunnel information, and sends the VXLAN packet to Device 2.
1.7.12.3 Applications
1.7.12.3.1 Application of an End-to-End Overlay VXLAN Tunnel
Service Description
GWs and DCI-PEs are separately deployed. DCI-PEs function as edge devices on the
underlay network and ensure VTEPs in data centers are reachable through routes, without
saving data center tenant and host information.
Networking Description
In Figure 1-506, data center gateways GW1 and GW2 are connected to the backbone network.
BGP/MPLS IP VPN functions are deployed on the DCI backbone network to transmit VTEP
IP information between GW1 and GW2. A VXLAN tunnel is established between GW1 and
GW2 for inter-data center E2E VXLAN packet encapsulation and VM communication.
Feature Deployment
IP and an IGP are deployed on the carrier network to ensure reachability between
DCI-PEs at the network layer.
MPLS is deployed on the carrier network, and an LDP LSP or TE LSP is established
between the DCI-PEs.
BGP/MPLS IP VPN functions are deployed on DCI-PEs.
Service Description
The solution of Option A VLAN Layer 3 access to DCI applies to the scenario where data
centers that do not support VXLAN are interconnected through a DCI network. This solution
has low requirements on GWs, but only a maximum of 4096 VLANs are available to this
solution.
GWs and DCI-PEs are separately deployed. Each DCI-PE considers the GW of a data center
as a CE, uses a Layer 3 VPN routing protocol to receive VM host routes from the data center,
and saves and maintains the routes.
Networking Description
In Figure 1-507, VXLAN tunnels are established within data centers to allow intra-DC VM
communication. To allow inter-data center VM communication, BGP/MPLS IP VPN
functions are deployed on the DCI backbone network, and a Layer 3 Ethernet sub-interface is
configured on each DCI-PE, added to the same VLAN, and bound to the VPN instance of
each DCI-PE.
Feature Deployment
IP and an IGP are deployed on the carrier network to ensure reachability between
DCI-PEs at the network layer.
MPLS is deployed on the carrier network, and an LDP LSP or TE LSP is established
between the DCI-PEs.
BGP/MPLS IP VPN functions are deployed on DCI-PEs.
A Layer 3 Ethernet sub-interface is created on each DCI-PE and is associated with a
VLAN.
Service Description
GWs and DCI-PEs are separately deployed. EVPN is used as a control plane protocol to
dynamically establish VXLAN tunnels. VPNv4 is used to send received host IP routes to the
peer DCI-PE, and packets of VM hosts can be forwarded at Layer 3.
Networking Description
In Figure 1-508, data center gateway devices GW1 and GW2 are connected to the DCI
backbone network. To allow inter-data center VM communication, BGP/MPLSIP VPN
functions are deployed on the DCI backbone network. In addition, EVPN and a VXLAN
tunnel are deployed between the GW and DCI-PE to transmit VM host routes so that VMs in
different data centers can communicate with each other.
Feature Deployment
IP and an IGP are deployed on the carrier network to ensure reachability between
DCI-PEs at the network layer.
MPLS is deployed on the carrier network, and an LDP LSP or TE LSP is established
between the DCI-PEs.
BGP/MPLS IP VPN functions are deployed on DCI-PEs.
EVPN is deployed between the GW and DCI to transmit routes and establish a VXLAN
tunnel.
Service Description
Each DCI-PE-GW functions not only as a data center gateway but also as a PE on the carrier
network. Each DCI-PE-GW learns VM host routes or MAC routes from a data center through
EVPN and forwards packets of VM hosts at Layer 3 or Layer 2.
Networking Description
In Figure 1-509, DCI-PE-GWs function as both data center gateways and MPLS PEs.
DCI-PE-GWs directly connect to data center devices and the Ps on the DCI backbone
network. To allow intra-data center VM communication, a VXLAN tunnel must be
established within each DC. To allow inter-data center VM communication, BGP/MPLS IP
VPN or EVPN functions must be deployed on the DCI backbone network, and EVPN and
VXLAN tunnels must be deployed between DCI-PE-GWs and data center devices to transmit
VM host IP routes or MAC routes.
Feature Deployment
IP and an IGP are deployed on the carrier network to ensure reachability between
DCI-PE-GWs at the network layer.
MPLS is deployed on the carrier network, and an LDP LSP or TE LSP is established
between the DCI-PE-GWs.
On DCI-PE-GWs, BGP/MPLS IP VPN functions are enabled to bear VM host routes, or
EVPN functions are enabled to bear VM MAC routes.
EVPN is deployed on each DCI-PE-GW as a control plane protocol to dynamically
establish a VXLAN tunnel to the connected device in the data center.
Terms
Term Definition
EVC Ethernet Virtual Connection, a model defined by the
Metro Ethernet Forum (MEF) and used to transmit
Ethernet services on metropolitan transport networks.
An EVC is a model, rather than a specific service or
technique.
1.7.13.1 Introduction
Definition
Proactive loop detection detects and eliminates Layer 2 network loops. When a device's
Ethernet or Eth-Trunk interface physically goes Up or an interface is bound to a VSI, the
device proactively detects and eliminates loops, if any.
Purpose
If a device's Ethernet or Eth-Trunk interface goes Up by misoperation or an interface is bound
to a VSI and the interface incurs a loop, the device may have services interrupted or even get
out of the NMS control. To resolve the loop problem and ensure normal device running,
Huawei developed proactive loop detection upon interface Up. This feature allows a device's
interface to proactively send loop detection packets. If the interface detects a loop, the device
blocks the interface.
1.7.13.2 Principles
1.7.13.2.1 Proactive Loop Detection
Triggering Condition
Interface going Up
If an Ethernet interface, Ethernet trunk interface, or a specified Ethernet trunk member
interface physically goes Up, the proactive loop detection function is triggered to detect
whether the Ethernet interface, all members of the Ethernet trunk interface, or the
specified Ethernet trunk member has a loop. If they have a loop, this function sets them
to Down. Note that if an Ethernet interface goes Down, its associated sub-interfaces also
go Down.
Interface bound to a VSI
If an Ethernet interface, Ethernet sub-interface, Ethernet trunk interface, or Ethernet
trunk sub-interface is bound to a VSI, proactive loop detection is triggered on the
Detection Principles
When a device's Ethernet or Eth-Trunk interface goes Up or an interface is bound to a VSI,
the interface proactively sends a loop detection packet. If the device receives the loop
detection packet sent through a VPLS domain within the configured period, a loop occurs on
the network. In this case, the device blocks the interface sending the loop detection packet and
reports an alarm.
On the network shown in Figure 1-510, AC1 sends a loop detection packet. If AC2 receives
this packet within a loop detection period, a loop occurs on the network.
Proactive loop detection applies only to VPLS scenarios, not VLAN scenarios.
It is recommended that you disable this function on properly running devices. If you have to
use this function to detect whether links operate properly during site deployment, be sure to
disable this function after this stage.
In Figure 1-511, t1 = t2 = t3 = 3s, and t4 = t5 = 10s. AC2 processes only the most recently
sent loop detection packet. For example, if AC1 sends a loop detection packet at the a second,
AC2 determines whether the packet was sent at the a second by AC1 upon receiving the
packet.
If so, a loop occurs. The device then sets the link layer protocol of AC1 to Down and
reports an alarm to the NMS.
If not, the device simply waits for the packet to be sent.
A loop detection packet carries at most two VLAN tags (for example, in a QinQ VLAN tag termination
scenario).
1.7.13.3 Application
1.7.13.3.1 AC Interface Receiving a Loop Detection Packet
In Figure 1-513, PE1's AC1 is an Ethernet interface. After AC1 goes physically Up, it
proactively sends a loop detection packet. If AC2 receives this packet, a loop occurs on the
network. PE1 then sets the link layer protocol of AC1 to Down and reports an alarm to the
NMS. This mechanism prevents AC1 from sending or receiving any packets.
In Figure 1-514, PE1's AC interface is an Ethernet interface. After the AC interface goes
physically Up, it proactively sends a loop detection packet. If this packet loops back to PE1
through Switch, PE2, and the PW between PE1 and PE2, a loop occurs on the network. PE1
then sets the link layer protocol of the AC interface to Down and reports an alarm to the NMS.
This mechanism prevents the AC interface from sending or receiving any packets.
1.7.14 RRPP
1.7.14.1 Principles
1.7.14.1.1 Basic Concepts
Basic Concepts
Ethernet devices can be configured as nodes with different roles on an RRPP ring. RRPP ring
nodes exchange and process RRPP packets to detect the status of the ring network and
communicate any topology changes throughout the network. The master node on the ring
blocks or unblocks the secondary port depending on the status of the ring network. If a device
or link on the ring network fails, the backup link immediately starts to eliminate loops.
RRPP ring
An RRPP ring consists of interconnected devices configured with the same control
VLAN. An RRPP ring has a major ring and subring. Sub-ring protocol packets are
transmitted through the major ring as data packets; major ring protocol packets are
transmitted only within the major ring.
Control VLAN
The control VLAN is a concept relative to the data VLAN.In an RRPP ring, a control
VLAN is used to transmit only RRPP packets, whereas a data VLAN is used to transmit
data packets.
Node type
Master node: The master node determines how to handle topology changes. Each RRPP
ring must have only one master node. Any device on the Ethernet ring can serve as the
master node.
Transit node: On an RRPP ring, all nodes except the master node are transit nodes. Each
transit node monitors the status of its directly connected RRPP link and notifies the
master node of any changes in link status.
Edge node and assistant edge node: An device can serve as an edge node or assistant
edge node on the sub-ring, and as a transit node on the major ring. On an RRPP sub-ring,
either of the two nodes crossed with the major ring can be specified as an edge node, and
if one of the two nodes crossed with the major ring is specified as an edge node, the
other node is the assistant edge node. Each sub-ring must have only one edge node and
one assistant edge node.
RRPP Packets
− MAJOR-FAULT= 0x0b
SYSTEM_MAC_ADDR: indicates the bridge MAC address from which the packet is
sent. This field occupies 48 bits.
RRPP snooping is enabled on the sub-interface or VLANIF interface of NPE D and associated
with other VSIs on the local device. When the RRPP ring fails, NPE D on the VPLS network
clears the forwarding entries of the VSIs (including the associated VSIs) on the local node and
the forwarding entries of the remote NPE B to re-learn forwarding entries. This ensures that
traffic can be switched to a normal path and downstream traffic will be normally forwarded.
As shown in Figure 1-517, the link between NPE D and UPE A is faulty, and the RRPP master
node UPE A sends a COMMON-FLUSH-FDB packet to notify the transit nodes on the RRPP
ring to clear its MAC address table.
Figure 1-517 Association between RRPP and VPLS (RRPP ring fault)
The original MAC address table is not cleared because NPE D cannot process the
COMMON-FLUSH-FDB packet. If the downstream data packet sent to UPE A exists, NPE D
sends it to UPE A along the original path. This leads to an interruption in downstream traffic
between NPE D and NPE A. After UPE B clears its MAC address table, the upstream data
packet sent by UPE A is regarded as an unknown unicast packet to be forwarded to the VPLS
network along the path UPE A->UPE B->NPE D. After re-learning the MAC address, NPE D
can correctly forward the downstream traffic intended for UPE A.
When the RRPP ring fault recovers, UPE A, the master node, sends a
COMPLETE-FLUSH-FDB packet to notify the transit node to clear its MAC address table.
The downstream traffic between NPE D and UPE A is interrupted because NPE D cannot
process the COMPLETE-FLUSH-FDB packet
As shown in Figure 1-518, after RRPP snooping is enabled on sub-interfaces GE 1/0/0.100
and GE 2/0/0.100 of NPE D, NPE D can process the COMMON-FLUSH-FDB or
COMPLETE-FLUSH-FDB packet.
Figure 1-518 Association between RRPP and VPLS (enabling the RRPP snooping)
When the RRPP ring topology changes and NPE D receives the COMMON-FLUSH-FDB or
COMPLETE-FLUSH-FDB packet from the master node UPE A, NPE D clears the MAC
address table of the VSI associated with sub-interfaces GE 1/0/0.100 and GE 2/0/0.100 and
then notifies other NPEs in this VSI to clear their MAC address tables also.
If the downstream data packet sent to UPE A exists, this packet is regarded as an unknown
unicast packet to be broadcast in the VLAN and sent to UPE A along the path UPE D->UPE
B->NPE A because NPE D cannot find the MAC address table. This ensures continuity of
downstream traffic.
Related Version
The following table lists the product version related to this document.
Intended Audience
This document is intended for:
Network planning engineers
Commissioning engineers
Data configuration engineers
System maintenance engineers
Security Declaration
Encryption algorithm declaration
The encryption algorithms DES/3DES/SKIPJACK/RC2/RSA (RSA-1024 or
lower)/MD2/MD4/MD5 (in digital signature scenarios and password encryption)/SHA1
(in digital signature scenarios) have a low security, which may bring security risks. If
protocols allowed, using more secure encryption algorithms, such as AES/RSA
(RSA-2048 or higher)/SHA2/HMAC-SHA2 is recommended.
Password configuration declaration
− Do not set both the start and end characters of a password to "%^%#". This causes
the password to be displayed directly in the configuration file.
− To further improve device security, periodically change the password.
Personal data declaration
Your purchased products, services, or features may use users' some personal data during
service operation or fault locating. You must define user privacy policies in compliance
with local laws and take proper measures to fully protect personal data.
Feature declaration
− The NetStream feature may be used to analyze the communication information of
terminal customers for network traffic statistics and management purposes. Before
enabling the NetStream feature, ensure that it is performed within the boundaries
permitted by applicable laws and regulations. Effective measures must be taken to
ensure that information is securely protected.
− The mirroring feature may be used to analyze the communication information of
terminal customers for a maintenance purpose. Before enabling the mirroring
function, ensure that it is performed within the boundaries permitted by applicable
laws and regulations. Effective measures must be taken to ensure that information is
securely protected.
− The packet header obtaining feature may be used to collect or store some
communication information about specific customers for transmission fault and
error detection purposes. Huawei cannot offer services to collect or store this
information unilaterally. Before enabling the function, ensure that it is performed
Special Declaration
This document serves only as a guide. The content is written based on device
information gathered under lab conditions. The content provided by this document is
intended to be taken as general guidance, and does not cover all scenarios. The content
provided by this document may be different from the information on user device
interfaces due to factors such as version upgrades and differences in device models,
board restrictions, and configuration files. The actual user device information takes
precedence over the content provided by this document. The preceding differences are
beyond the scope of this document.
The maximum values provided in this document are obtained in specific lab
environments (for example, only a certain type of board or protocol is configured on a
tested device). The actually obtained maximum values may be different from the
maximum values provided in this document due to factors such as differences in
hardware configurations and carried services.
Interface numbers used in this document are examples. Use the existing interface
numbers on devices for configuration.
The pictures of hardware in this document are for reference only.
Symbol Conventions
The symbols that may be found in this document are defined as follows.
Symbol Description
Indicates an imminently hazardous situation which, if not
avoided, will result in death or serious injury.
Symbol Description
personal injury, equipment damage, and environment
deterioration.
Change History
Updates between document issues are cumulative. Therefore, the latest document issue
contains all updates made in previous issues.
Changes in Issue 03 (2017-09-20)
This issue is the third official release. The software version of this issue is
V800R009C10SPC200.
Changes in Issue 02 (2017-07-30)
This issue is the second official release. The software version of this issue is
V800R009C10SPC100.
Changes in Issue 01 (2017-05-30)
This issue is the first official release. The software version of this issue is
V800R009C10.
Definition
IMA is the acronym of Inverse Multiplexing for ATM. The general idea of IMA is that the
sender schedules and distributes a high-speed ATM cell stream to multiple low-speed physical
links for transmission, and then the receiver schedules and reassembles the stream fragments
into one cell stream and submits the cell stream to the ATM layer. In this manner, bandwidths
are multiplexed flexibly, improving the efficiency of bandwidth usage.
Purpose
When users access an ATM network at a rate between T1 and T3 or between E1 and E3, using
T3 or E3 lines is cost-ineffective for carriers. Using multiple T1 or E1 lines is more flexible
and efficient. IMA allows a network designer and administrator to use multiple T1 or E1 lines,
not the expensive T3 or E3 lines, to implement ATM access.
Benefits
IMA has the following advantages:
Provides a rate that is lower than the T3/E3 rate but higher than the T1/E1 rate.
Maintains the order of cells, which facilitates ATM management
IMA provides the following benefits for carriers.
Construction and maintenance of networks will cost less.
Networks can be expanded flexibly and bandwidth usage is more efficient.
1.8.2.2 Principles
1.8.2.2.1 Basic IMA Principles
IMA performs inverse multiplexing of an ATM cell flow to multiple physical links and
remotely restores the original cell flow on these physical links. The ATM cell flows are
multiplexed on multiple physical links on a per cell basis. To know the IMA feature, you need
to learn the basic concepts of IMA.
Basic Concepts
IMA group
An IMA group can be considered a logical link that aggregates several low-speed
physical links (member links) to provide higher bandwidth. The rate of the logical link is
approximately the sum of the rate of the member links in the IMA group.
Minimum number of active links
It refers to the minimum number of active links that are required when the IMA group
enters the Operational state. Link faults may cause the number of active links for the
IMA group in the Operational state to be smaller than the configured minimum value. As
a result, the IMA group status changes and IMA may go Down. Two communication
devices can be configured with different minimum numbers of active links, but both
devices must be configured with at least the specified minimum number of active links to
be able to properly send ATM cells.
ICP cell
ICP is short for IMA Control Protocol. ICP cells are a type of IMA negotiation cells,
used mainly to synchronize frames and transmit control information (such as the IMA
version, IMA frame length, and peer mode) between communicating devices. The offset
of ICP cells in IMA frames on a link is fixed. Like common cells, ICP cells consist of a
5-byte header and 48-byte payload.
Filler cell
In the ATM model without an IMA sub-layer, decoupling of cell rates is implemented by
Idle cells at the Transmission Convergence (TC) sub-layer. After the IMA sub-layer is
adopted, decoupling of cell rates can no longer be implemented at the TC sub-layer due
to frame synchronization. Therefore, Filler cells are defined at the IMA sub-layer to
implement decoupling of cell rates. If there is no ATM cell to be sent, the sender sends
Filler cells so that the physical layer transmits cells at a fixed rate. These filler cells are
discarded at the IMA receiving end.
Differential delay
Links in an IMA group may have different delays and jitters. If the difference between
the greatest phase and the smallest phase in an IMA group exceeds the configured
differential delay, the IMA group removes the link with the longest delay from the
cyclical sending queue and informs the peer that the link is unavailable by sending the
Link Control Protocol (LCP) cells. Through negotiation between the two ends of a link,
the link becomes active and then rejoins the cyclical sending queue of the IMA group.
Table 1-136 Features supported by ATM IMA and their usage scenarios
IMA IMA divides one higher-speed When users access an ATM network
transmission channel into two or at a rate between T1 and T3 or
more lower-speed channels and between E1 and E3, using T3 or E3
transports an ATM cell stream lines is cost-ineffective for carriers.
across these lower-speed channels. In this scenario, IMA can be used.
At the far-end, IMA groups these IMA transports ATM traffic over
lower-speed channels and bundled low-speed T1 or E1 lines. It
reassembles the cells to recover the allows a network designer and
original ATM cell stream. administrator to use these T1 or E1
An IMA group can be considered a lines, not the expensive T3 or E3
logical link that aggregates several lines, to implement ATM access.
physical low-speed links (member
links) to provide higher bandwidth.
The rate of the logical link is
approximately the sum of the rate of
the member links in the IMA group.
Principles
Figure 1-519 shows inverse multiplexing and de-multiplexing of ATM cells in an IMA group.
The sending end: In the sending direction, IMA receives ATM cells from the ATM layer
and places them in circular order onto member links of the IMA group.
The receiving end: After reaching the receiving end, these cells are reassembled into the
original cell flow and transmitted onto the ATM layer. The IMA process is transparent to
the ATM layer.
Figure 1-519 Inverse multiplexing and de-multiplexing of ATM cells in an IMA group
1.8.2.3 Applications
1.8.2.3.1 ATM IMA Applications on an L2VPN
As show in Figure 1-521, after ATM services from the NodeA are converged at the E1 or T1
interface on PE1, ATM cells are encapsulated into PSN packets that can be transmitted over
PSNs. After arriving at the downlink PE2, the PSN packets are decapsulated into the original
ATM cells and then the ATM cells are sent to the 3G radio network controller (RNC).In this
solution, services of multiple types are converged at a PE on a PSN. This improves the
efficiency of current network resources, reduces Plesiochronous Digital Hierarchy (PDH)
VLLs, and facilitates the deployment of new sites as well as the maintenance and
management of multiple services.
Term
None.
Acronym/Abbreviation
Acronym/Abbreviation Full Spelling
FMC Fixed-Mobile Convergence
IMA Inverse Multiplexing for ATM
AN Access Node
PSN Pack Switched Network
IP RAN IP Radio Access Network
PWE3 Pseudo-Wire Emulation Edge-to-Edge
PW Pseudo Wire
QoS Quality of Service
1.8.3 ATM
This chapter describes the basic concepts, principles, and applications of Asynchronous
Transfer Mode (ATM) interface and protocol.
1.8.3.1 Introduction
Definition
ATM was designated as the transmission and switching mode for Broadband Integrated
Services Digital Networks (B-ISDN) by the ITU-T in June 1992. Due to its high flexibility
and support to the multi-media service, ATM is considered as the key for realizing broadband
communications.
Defined by the ITU-T, ATM implements transmission, multiplexing, and switching of data
based on cells. ATM is a cell-based and connection-oriented multiplexing and switching
technology.
An ATM cell has a fixed length of 53 bytes. As defined by the ITU-T, ATM transmits,
multiplexes, and switches data based on cells. For example, the messages of voice, video, and
data are all transmitted in the cells of the fixed length. This ensures the fast data transmission.
Purpose
ATM provides the network with a versatile and connection-oriented transfer mode that applies
to different services.
Before the Gigabit Ethernet technology, ATM backbone switches were mostly used on
backbone networks to ensure high bandwidth. ATM dominated among network technologies
because it can provide good QoS and transmit voice, data, and video with high bandwidth.
Nevertheless, the initial roadmap for ATM, coping with all the network communication issues,
was too ambitious and idealistic. As a result, the ATM implementation became so complicated.
The aim of the ATM technology is too ideal. The realization of ATM is complex. The
perfection of the ATM technology and complexity of its architecture result in the difficulties
of developing, configuring, managing, and troubleshooting the ATM system.
ATM network devices are quite expensive. The ATM network cannot be affordable for people
and its excellent performance is unknown from the origin of ATM.
In the late 1990's, Internet and IP technology overshadowed ATM for their simplicity and
flexibility. They developed at a fast rate in the application field. This made a severe impact on
the B-ISDN plan.
ATM is, however, still regarded as the best transmission technology of B-ISDN because it has
advantages in transporting integrated services. Therefore, the IP technology integrated with
ATM. This brought about the new era of constructing broadband networks through the
integration of the IP and ATM technologies.
1.8.3.3 Principles
1.8.3.3.1 ATM Protocol Architecture
AAL: It integrates with the ATM layer and is similar to the data link layer of the OSI
reference model. AAL is mainly responsible for isolating the upper-layer protocols from
the ATM layer. It prepares the switchover from the data to cells, and divides the data into
a 48-byte cell payload.
Upper layer: It receives data, divides it into packets, and transmits it to AAL for
processing.
Each layer is further divided into several sub-layers.
The comparison between the ATM protocol architecture and the OSI reference model is
shown in Figure 1-523.
Figure 1-523 Comparison between the ATM protocol architecture and the OSI reference model
Table 1-137 Functions of layers and sub-layers in the ATM reference model
The detailed functions of layers and sub-layers in the ATM reference model are described in
the following sections.
Table 1-138 Comparison between the common transmission rates of SONET and SDH
− The user layer lies at the top of the SONET physical layer.
− The transmission channel layer, digital line layer, and segment regeneration layer
are three sub-layer entities of the SONET physical layer.
The transmission channel layer is mainly responsible for assembling and
disassembling cells for SONET frame signals.
The digital line layer adds the packet header (such as system overhead) and
performs multiplexing.
The segment regeneration layer includes the segment layer and photon layer.
After data arrives at the segment regeneration layer, the segment layer appends
a segment header, encapsulates the data in a frame, and transmits this frame to
the photon layer. Then, the photon layer sends this frame after switching the
electric signals into optical signals.
The frame format of the STS-3c that bears ATM cells is shown in Figure 1-525.
The direct mapping mode is more efficient than the PLCP mode and can support up
to 44.21 Mbit/s.
Similar to DS-3, E3 adopts two technologies: PLCP and direct mapping into E3.
Compared with DS-S PCLP, E3 PLCP has the following differences:
− It adopts the G.751 format, and inserts the tail used to synchronize E3 after every
nine cells.
− Its tail length ranges from 18 to 20 bytes, and that of DS-3 PLCP ranges from 6.5 to
7 bytes.
ATM cells are directly mapped into E3 frames in the G.832 standard.
ATM cells are directly mapped into 530-byte payload with the system overhead
occupying 7 bytes.
ATM IMA
1.8.2 ATM IMA describes the principles of ATM IMA.
ATM Bundling
ATM bundling is an extended ATM PWE3 application and is applicable to IP RAN networks.
On the network shown in Figure 1-528, nodeBs are connected to a Cell Site Gateway (CSG)
using ATM links. Each nodeB probably transmits both voice and data services. Configuring a
PWE3 PW for each service on every nodeB connected to an Radio Network Controller (RNC)
will expose heavy burden on the CSG. Bundling physical links to one PW to transmit the
same type of service from different nodeBs to the RNC relieves the burden on the CSG and
provides service scalability.
ATM bundling is an ATM PWE3 extension and provides logical ATM bundle interfaces.
PWE3 PWs are established on ATM bundle interfaces and PVCs are configured on Serial
sub-interfaces( specifies ATM as the link layer protocol ). After Serial sub-interfaces join the
ATM bundle interfaces, PVCs on these sub-interfaces are mapped to specified PWs. This
reduces the number of PWs and system burden. ATM bundle interfaces forward traffic as
follows:
1. After receiving user traffic through a PVC of an ATM bundle member interface on a
CSG, the CSG forwards user traffic to a PW to which the PVC is mapped.
2. After receiving traffic from an RNC, the CSG maps traffic to specific ATM bundle
member interfaces based on PVCs and these ATM bundle member interfaces forward
traffic to specific nodeBs.
User-to-Network Interface
The UNI defines the interfaces between the peripheral devices and ATM switches.
Depending on whether the switches are owned by clients or operators, UNIs can be
divided into public UNIs and private UNIs.
Private UNIs are connected to two switches on the same private ATM network and used
inside the private ATM network. Public UNIs are connected to ATM peripheral devices
or private ATM switches to public ATM switches.
Network-to-Network Interface
The NNI refers to the interfaces between ATM switches.
Depending on whether the switches are owned by clients or operators, NNI can be
divided into two types: public NNIs and private NNIs.
Connected to two switches on the same private ATM network, the private NNI is used
inside the private ATM network. Connected to two ATM switches of the same public
network carrier, the public NNI is used by one ATM service provider.
B-ISDN Inter Carrier Interface
A B-ISDN Inter Carrier Interface (B-ICI) is connected to the public switches of different
network carriers and provides internal connections to multiple ATM network carriers.
B-ICIs are directly connected to NNIs.
Figure 1-529 shows the connections between various ATM network interfaces.
Figure 1-529 ATM network interfaces of the private and public networks
The VP is used to adapt to high-speed networks in which network control cost is increasing.
The VP technology reduces the control cost by binding the connections of the same paths on a
shared network into a unit. By doing so, the network management can only process lesser
number of connections, instead of a larger number of independent connections.
In the ATM communication, an ATM switch transmits the received cells to the output
interface according to the VPI/VCI of the input cells and the forwarding table that is
generated during the setup of a connection. At the same time, this ATM switch changes the
VPI/VCI of a cell into that of an outgoing interface to complete the VP switching or VC
switching.
ATM VCs are of the following types: permanent virtual circuit (PVC), switching virtual
circuit (SVC), and soft virtual circuit (soft VC).
The PVC is statically configured by the administrator. Once it is set up, it cannot be
removed. PVC applies to connections for advanced requirements.
The SVC is set up through the signaling protocol. It can be connected and removed
through commands.
When a node receives the connection request from other nodes, the connection response
information needs to be sent to this node if configuration requirements are satisfied.
After the connection is set up, the connection request is sent to the next target node.
The removing process is similar to the setting up of the connection.
Soft VC indicates that the ATM network is based on SVC, but peripheral devices access
the ATM network in PVC mode.
The setting up of soft VCs is similar to that of SVCs. The only difference is that PVCs
must be manually configured between ATM switch interfaces and peripheral devices.
The advantage of this mode is that it is easy to manage users if PVCs are connected to
the users. In addition, SVCs can ensure the proper usage of the links.
In the ATM switching table shown in Figure 1-532, the first line shows that cells sent from the
port with VPI/VCI as 4/55 to a switch changes the cell header VPI/VCI to 8/62. Then, these
cells are sent out from port 3.
The UNI cell header is used for communication between the ATM terminal and switching
nodes on an ATM network.
Figure 1-533 shows the format of a UNI cell header.
The NNI cell header is used for communication between two switching nodes.
Figure 1-534 shows the NNI cell header format.
Unassigned cell: Its VPI is 0, VCI is 0, PTI can be any value, and CLP is 1.
OAM cell: For the VP sub-layer, its VCI is 3 and it is used for the VP link. When VCI is
4, it is used for the VP connection. For the VC sub-layer, it is used for the VC link when
PTI is 4. When PTI is 5, it is used for the VC connection.
Signaling cell: It is divided into the following types:
− Component signaling cell: Its VPI can be any value, and VCI is 1.
− General broadcast signaling cell: Its VPI can be any value, and VCI is 2.
− Point-to-point (P2P) signaling cell: Its VPI can be any value, and VCI is 5.
Payload type: Its length is 3 bits. It is used to identify the information field, that is, the
payload type. The following lists the PT values and corresponding meanings defined by
the ITU-T I.361.
− PT = 000: indicates that the data cell does not experience congestion and ATM user
to user (AUU) is 0.
− PT = 001: indicates that the data cell does not experience congestion and AUU is 1.
− PT = 010: indicates that the data cell experiences congestion and AUU is 0.
− PT = 011: indicates that the data cell experiences congestion and AUU is 1.
− PT = 100: indicates the cells related to the OAM F5 segment.
− PT = 101: indicates the OAM F5 end-to-end cells.
− PT = 110: indicates the resource management cells.
− PT = 111: This PT is for future use.
When cells are used to carry data:
The first bit of PT is 0.
The second bit identifies whether cells experience congestion and can be set through the
network node when there is congestion.
The third bit is an AUU indicator. AUU = 0 indicates that the corresponding SAR-PDU
is the beginning segment or intermediate segment. AUU = 1 indicates that SAR-PDU is
the ending segment.
ATM OAM
Overview of OAM
According to different protocols, OAM has two different definitions.
− OAM: Operation And Maintenance (ITU-T I.610 02/99)
− OAM: Operation Administration and Maintenance (LUCENT APC User Manual,
03/99)
OAM offers a mechanism to detect and locate faults, and verify the network performance
without interrupting the service. After some OAM cells with the standard structure are
inserted in user cell flow, certain specific information can be provided.
ATM OAM Supported by NE20E
Currently, on Huawei NE20Es, OAM mainly checks the connectivity of PVCs.
The OAM process is as follows:
a. Two ends simultaneously send OAM cells at a specified interval to their peers.
b. If the peer replies with a signal after receiving the OAM cell, it indicates the link is
normal. If the local timer finds that the OAM cell times out, the local port considers
that the link fails.
OAM functions can vary with different chips. Main OAM functions are as follows.
error detection. The frames change into the integer multiple of 48-byte payload through
filling.
Segmentation and Reassembly
When peripheral devices send data, segmentation and reassembly (SAR) is used to
divide aggregation frames into 48-byte payloads. When peripheral devices receive data,
SAR is used to reassemble 48-byte payloads into aggregation frames.
AAL Type
Currently, there are four types of AAL: AAL1, AAL2, AAL3/4, and AAL5. Each type
supports certain specified services on the ATM network. Products produced by most ATM
equipment manufacturers widely adopt AAL5 to support data communication service.
AAL1
AAL1 is used for constant bit rate (CBR), sending data at a fixed interval.
AAL1 uses one part of the 48-byte payload to bear additional information, such as
sequence number (SN) and sequence number protection (SNP). SN contains 1-bit
convergence sublayer identifier and 3-bit sequence counting (SC). CSI is also used for
timing.
AAL2
Compared with AAL1, AAL2 can transmit compressed voice and realize common
channel signaling (CCS) inside ISDN.
Details on AAL2 are defined in ITU-T 363.2.
AAL2 supports the processing of compressed voice at the upper limit rate of 5.3 Kbit/s.
This realizes silence detection, suppression, elimination, and CCS. In addition, higher
bandwidth utilization is available. Segments can be encapsulated into one or multiple
ATM cells.
CS of AAL2 can be divided into CPCS and SSCS. SSCS is on top of CPCS. The basic
structure of AAL2 users can be recognized through CPCS. Error check, data
encapsulation, and payload breakdown can be performed.
AAL2 allows payloads of variable length to exist in one or multiple ATM cells.
AAL3/4
As the first technology trying to realize cell delay, AAL3/4 stipulates the
connection-oriented and connectionless data transmission.
CPCS is used to detect and process errors, identify the CPCS-service data unit (SDU) to
be transmitted, and determine the length of the CPCS-packet data unit (PDU).
AAL5
AAL5 can also process connection-oriented and connectionless data. AAL5 is called the
simple and valid adaptation layer. It uses 48 bytes to load the payload information. AAL5
does not use the additional information bit. It contains no sequence number and cannot
detect errors.
AAL5 SAR sublayer is simple. It divides CPCS-PDUs into 48-byte SAR-PDUs without
any overhead and realizes the reverse function when receiving data.
The CPCS-PDU format of AAL5 CPCS is shown in Figure 1-535.
The length of the CPCS-PDU payload is variable and ranges from 1 to 65535 bytes.
As shown in Figure 1-535, no CPCS-PDU header exists. A CPCS-PDU tail, however,
occupies eight bytes. The meaning of each field in Figure 1-535 is as follows:
− PAD: indicates the stuffing bit, making the CPCS-PDU length as the integer
multiple of 48-byte payload.
− UU: is used for transparent transmission of CPCS user information.
− CPI: is used to change the CPCS-PDU tail so that it is 8 bytes.
− L: indicates the payload length of CPCS-PDU.
− CRC: protects CPCS-PDU
SSCS of AAL5 CS is similar to AAL3/4. CPCS is also shared by upper layers. CPCS
performs error detection, processes errors, fills bytes to form 48-byte payloads, and
discards the received incomplete CPCS-PDU.
LLC/SNAP Encapsulation
LLC encapsulation is needed when several protocols are carried over the same VC. To ensure
that the receiver properly processes the received AAL5 CPCS-PDU packets, the payload field
must contain information necessary to identify the protocol of the routed or bridged PDU. In
LLC encapsulation, this information is encoded in an LLC header placed in front of the
carried PDU.
There are two types of LLC:
LLC type 1: Unacknowledged connectionless mode
ISO routing protocol is identified by a 1-byte Network Layer Protocol Identifier (NLPID)
field that is a part of the protocol data. NLPID values are administered by ISO and
ITU-T.
An NLPID value of 0x00 is defined in ISO/IEC TR 9577 as the null network layer or
inactive set. Since it has no significance within the context of this encapsulation scheme,
an NLPID value of 0x00 is invalid.
Although an IP is not an ISO protocol, the IP has an NLPID value of 0xCC. For an IP, it
adopts the preceding encapsulation format that is not used often.
The LLC header value 0xAA-AA-03 identifies a SNAP header with IEEE802.1a. Figure
1-538 shows the format of a SNAP header.
In the detailed format of an IPv4 PDU, the Ethernet type value is 0x08-00. Figure 1-540
shows the format of the IP PDU.
The AAL5 CPCS-PDU Payload field carrying a bridged PDU must have one of the
following formats.
It is required to add padding after the PID field to align the user information field of the
Ethernet, 802.3, 802.4, 802.5, FDDI, and 802.6 PDUs.
The sequence of a MAC address must be the same as that in the LAN or MAN.
Padding is added to ensure that the length of a frame on the Ethernet/802.3 physical layer
reaches the minimum value. Padding must be added when bridged Ethernet/802.3 PDU
encapsulation with the LAN FCS is used. Otherwise, you do not need to add padding.
When frames without the LAN FCS are received, the bridge must add some padding to
the frames before forwarding the frames to an Ethernet/802.3 subnet.
The common PDU header and trailer are conveyed in sequence at the egress bridge to an
802.6 subnet. Specifically, the common PDU header contains the BAsize field, which
contains the length of the PDU.
If this field is not available to the egress 802.6 bridge, that bridge cannot begin to
transmit the segmented PDU until it has received the entire PDU, calculated the length,
and inserted the length into the BAsize field.
If the field is available, the egress 802.6 bridge can extract the length from the BAsize
field of the Common PDU header, insert it into the corresponding field of the first
segment, and immediately transmit the segment onto the 802.6 subnet.
For the egress 802.6 bridge, you can set the length of the AAL5 CPCS-PDU to 0 to
ignore AAL5 CPCS-PDUs.
VC Multiplexing
In the multiplexing technologies based on the VC, the VC between two ATM sites is used to
differentiate the protocols that carry network interconnection. That is, each protocol must be
carried over each VC.
Therefore, no additional multiplexing information is contained on the payload of each AAL5
CPCS-PDU. This can save bandwidth and reduce the processing cost.
VC Multiplexing for Routed Protocols
In VC multiplexing for routed protocols, the Payload field of an AAL5 CPCS-PDU
contains only the routed PDU packet. The format of the PDU packet is shown in Figure
1-546.
1.8.3.4 Application
1.8.3.4.1 IPoA
IP over AAL5 (IPoA) means that AAL5 bears IP packets. That is, IP packets are encapsulated
in ATM cells and transmitted on the ATM network.
Realization
As shown in Figure 1-550, on Device A, PVC 0/40 can reach Device B, and PVC 0/41 can
reach Device C. If IP packets sent to Device B need to be sent from PVC 0/40, the IP address
of Device B must be mapped on PVC 0/40. After address mapping is set up, Device A sets up
a route that reaches the IP address of Device B. The outgoing interface is the interface where
ATM PVC 0/40 resides.
1.8.3.5 Impact
1.8.3.5.1 On System Performance
Terms
Term Description
ATM Recommendation ITU-R F.1499 defines the Asynchronous
Transfer Mode (ATM) as a protocol for the transmission of a
variety of digital signals using uniform 53 byte cells.
Recommendation ITU-R M.1224 defines ATM as a transfer mode
in which information is organized into cells. It is asynchronous in
the sense that the recurrence of cells depends on the required or
instantaneous bit rate. Statistical and deterministic values may also
be used to qualify the transfer mode.
Cell ATM organizes digital data into 53-byte cells and then transmits,
multiplexes, or switches them. An ATM cell consists of 53 bytes.
The first 5 bytes is the cell header that contains the routing and
priority information. The remaining 48 bytes are payloads.
Multi-network PVC A multi-network PVC travels multiple networks. It consists of PVC
segments on different networks.
Sub-interface Sub-interfaces enable one physical interface to provide multiple
logical interfaces. Configuring sub-interfaces on a physical
interface associates these logical interfaces with the physical
interface.
CC Continuity Check
CCITT International Telegraph and Telephone Consultative Committee
CHAP Challenge Handshake Authentication Protocol
CLP Cell Loss Priority
CPCS Common Part Convergence Sublayer
CPCS Common Part Convergence Sublayer
CS Convergence Sublayer
CS Convergence Sublayer
FDDI Fiber Distributed Digital Interface
GFC Generic Flow Control
HEC Header Error Control
IPoA Internet Protocol over ATM
International Telecommunication Union-Telecommunication
ITU-T
Standardization Sector
LLC Logical Link Control
MMF Multi-mode Fiber
NNI Network-to-Network Interface
OAM Operation, Administration and Maintenance
OSI Open System Interconnection
PAP Password Authentication Protocol
PLCP Physical Layer Convergence Protocol
PM Performance Monitoring
PPP Point-to-Point Protocol
PT Payload Type
PTI Payload Type Indicator
PVC Permanent Virtual Circuit
QoS Quality of Service
RDI Remote Defect Indication
SAR Segmentation and Reassembly
SAR-PDU Segmentation and Reassembly-Protocol Data Unit
SDH Synchronous Digital Hierarchy
Definition
Frame Relay (FR) is a Layer 2 packet-switched technology that allows devices to use virtual
circuits (VCs) to communicate on wide area networks (WANs).
Purpose
During the 1990s, rapid network expansion gave rise to the following requirements on
networks:
1. High transmission rate and low delay
2. Bandwidth reservation for traffic bursts
3. Accommodation for diversified intelligent user devices
The traditional methods used to meet the preceding requirements are circuit switching (leased
lines) and X.25 packet switching. However, these two methods have the following
disadvantages:
Circuit switching: Service deployment is costly, link usage efficiency is low, and
transmission of traffic bursts is unsatisfactory.
X.25 packet switching: Switches and service deployment are costly, and because the
X.25 protocol is complicated, the transmission rate is low and the latency high.
FR was therefore introduced to meet such requirements. Unlike circuit switching and X.25
packet switching, FR is highly efficient, cost-effective, reliable, and flexible. With these
advantages, FR became popular in WAN deployment in the 1990s. Table 1-140 compares
circuit switching, X.25 packet switching, and FR.
Table 1-140 Comparison among circuit switching, X.25 packet switching, and FR
Performance Circuit Switching X.25 Packet FR
Indicator Switching
Time Division Supported Not supported Not supported
Multiplexing (TDM)
VC multiplexing Not supported Supported Supported
Port sharing Not supported Supported Supported
Transparent Supported Not supported Supported
transmission
Traffic burst Not supported Supported Supported
processing
High throughput Supported Not supported Supported
Transmission rate Low Low Low
Delay Very short Long Short
Cost High Medium Low
Function Description
FR operates at the physical and data link layers of the Open System Interconnection (OSI)
reference model and is independent of upper layer protocols. This simplifies FR service
deployment. Characterized by a short network delay, low deployment costs, and high
bandwidth usage efficiency, FR became a popular communication technology in the early
1990s for WAN applications. FR has the following features:
Transmits data in variable-size units called frames.
Uses VCs instead of physical links to transmit data. Multiple VCs can be multiplexed
over one physical link, which improves bandwidth usage.
Is a streamlined version of X.25 and retains only the core functionality of the link layer,
thereby improving data processing efficiency.
Performs statistical multiplexing, frame transparent transmission, and error check at the
link layer. If FR detects an error, it drops the error frame; FR does not correct the errors.
In this way, FR does not involve frame sequencing, flow control, response, or monitoring
mechanism, and therefore reduces switch deployment costs, improves network
throughput, and shortens communication delay. The access rate of FR users ranges from
64 kbit/s to 2 Mbit/s.
Supports a frame size of at least 1600 bytes, suitable for LAN data encapsulation.
Provides several effective mechanisms for bandwidth management and congestion
control. Besides reserving committed bandwidth resources for users, FR also allows
traffic bursts to occupy available bandwidth, which improves bandwidth usage.
Is a connection-oriented packet-switched technology. It supports two types of circuits:
permanent virtual circuits (PVCs) and switched virtual circuits (SVCs). Currently, only
PVC services are deployed on FR networks.
Benefits
FR offers the following benefits:
Easy deployment. FR can be deployed on X.25 devices after upgrading the device
software; existing applications and hardware require no modification.
Flexible accounting mode. FR is suitable for traffic bursts and requires lower user
communication expenditure.
Dynamically allocation of idle network resources. FR increases carrier returns from
existing investments by utilizing idle network resources.
1.8.4.2 Principles
1.8.4.2.1 Introduction
On an FR network, devices connect to each other over VCs. A VC is a logical connection that
is identified by a data-link connection identifier (DLCI). Multiple VCs form a PVC.
The following describes several concepts involved in FR.
DLCI
DLCIs are used to identify VCs.
A DLCI is valid only on the local interface and its directly connected remote interface, and
enables the remote interface to know to which VC a frame belongs. Because FR VCs are
connection-oriented, the local DLCIs can be considered as FR addresses provided by local
devices.
A user interface on an FR network supports a maximum of 1024 VCs, and the number of
available DLCIs ranges from 16 to 1007.
DLCI. A PVC is established between two DTEs that are connected through NNIs. VCs are
differentiated by different DLCIs.
VC
A VC is a virtual circuit established between two devices on a packet-switched network. VCs
can be classified as either PVCs or SVCs.
PVCs are manually configured.
SVCs are automatically created and deleted through protocol negotiation.
PVCs are more prevalent on FR networks because few manufacturers of frame relay DCEs support SVC
connections.
1.8.4.2.2 LMI
Introduction
Both a DCE and its connected DTE need to know the PVC status. Local Management
Interface (LMI) is a protocol that uses status enquiry messages and state messages to maintain
link and PVC status, including adding PVC status information, deleting information about
disconnected PVCs, monitoring PVC status changes, and checking link integrity. There are
three standards for LMI:
ITU-T Q.933 Appendix A
ANSI T1.617 Appendix D
Vendor-specific implementation
This section describes LMI defined in ITU-T Q.933 Appendix A, which specifies the
information units and LMI implementation.
LMI Messages
There are two types of LMI messages:
Status enquiry messages: sent from a DTE to a DCE to request the PVC status or detect
the link integrity.
Status messages: sent from a DCE to a DTE to respond to status enquiry messages. The
status messages carry the PVC status or link integrity information.
LMI Reports
There are three types of LMI reports:
Link integrity verification only report: verifies the link integrity.
Full status report: verifies the link integrity and transmits link integrity information and
PVC status.
Single PVC asynchronous status report: notifies a DTE of a PVC status change.
On a UNI that connects a DTE to a DCE, the PVC status of the DTE is determined by the
DCE. To request the PVC status, the DTE sends a status enquiry message to the DCE. Upon
receipt of the message, the DCE replies with a status message that carries the requested status
information. The PVC status of the DCE is determined by other devices connected to the
DCE.
On an NNI that connects DCEs of a network, the DCEs periodically exchange PVC status.
1. A DTE sends a status enquiry message to its connected DCE, and at the same time, the
link integrity verification polling timer (T391) and the DTE counter (V391) start. The
value of T391 specifies the interval at which status enquiry messages are sent. The value
of the full status polling counter (N391), which includes the status of all PVCs, specifies
the interval at which full status reports are sent. You can specify the values of T391 and
N391 or use the default values.
− If the value of V391 is less than that of N391, the status enquiry message sent by
the DTE requests only link integrity information.
− If the value of V391 is equal to that N391, V391 is reset to 0, and the status enquiry
message sent by the DTE requests link integrity and PVC status information.
2. After receiving the enquiry message, the DCE responds with a status message, and at the
same time, the polling confirm timer (T392) of the DCE starts. If the DCE does not
receive a subsequent status enquiry message before T392 expires, the DCE records an
event and increases the value of the monitored events counter (N393) by 1.
3. The DTE checks the status message from the DCE. In addition to responding to every
enquiry that the DTE sends, the DCE automatically informs the DTE of the PVC status
when the PVC status changes or a PVC is added or deleted. This mechanism enables the
DTE to learn the PVC status in real time and maintain up-to-date records.
4. If the DTE does not receive a status message before T391 expires, the DTE records an
event and increases the value of N393 by 1.
5. N393 is an error threshold and records the number of events that have occurred. If the
value of N393 is greater than that of N392, the DTE or DCE considers the physical link
and all VCs unavailable. You can specify the values of N392 and N393 or use the default
values.
Table 1-141 lists the parameters required for LMI packet exchange. These parameters can be
configured to optimize device performance.
DTE N391 Full status polling counter The DTE sends a full status
report or a link integrity
verification only report at an
interval specified by T391. The
numbers of full status reports
and link integrity verification
only reports to be sent are
determined using the following
formula: Number of link
integrity verification only
reports/Number of full status
reports = (N391 - 1)/1.
N392 Error threshold Specifies the threshold number
of errors.
N393 Monitored event counter Specifies the total number of
monitored events.
T391 Polling timer at the user side Specifies the interval at which
the DTE sends status enquiry
FR Frame Encapsulation
FR encapsulates a network layer protocol (IP or IPX) in the Data field of a frame and sends
the frame to the physical layer for transmission. Figure 1-553 shows FR frame encapsulation.
Upon receipt of a Protocol Data Unit (PDU) from a network layer protocol (IP for example),
FR places the PDU between the Address field and frame check sequence (FCS). FR then adds
Flags to delimit the beginning or end of the frame. The value of the Flags field is always
01111110. After the encapsulation, FR sends the frame to the physical layer for transmission.
Figure 1-554 shows the basic format of an FR frame. In the format, the Flags field indicates
the beginning or end of the FR frame, and key information about the frame is carried in
Address, Data, and FCS. The 2-byte Address field is comprised of a 10-bit data-link
connection identifier (DLCI) and a 6-bit congestion management identifier.
A maximum of 1024 VCs can be configured on a user interface of an FR device, but the number of
available DLCIs ranges from 16 to 1007. The values 0 and 1023 are reserved for LMI.
− C/R: follows DLCI in the Address field. The C/R bit is currently not defined.
− Extended Address (EA): indicates whether the byte in which the EA value is 1 is the
last addressing field. If the value is 1, the current byte is determined to be the last
DLCI byte. Although a two-byte DLCI is generally used in FR, EA supports longer
DLCIs. The eighth bit of each byte of the Address field indicates the EA.
− Congestion control: consists of three bits, which are forward-explicit congestion
notification (FECN), backward-explicit congestion notification (BECN), and
discard eligibility (DE).
Data: contains encapsulated upper-layer data. Each frame in this variable-length field
includes a user data or payload field of a maximum of 16000 bytes.
FCS: is used to check the integrity of frames. A source device computes an FCS value
and adds it to a frame before sending the frame to a receiver. Upon receipt of the frame,
the receiver computes an FCS value and compares the two FCS values. If the two values
are the same, the receiver processes the frame; if the two values are different, the
receiver discards the frame. If the frame is discarded, FR does not send a notification to
the source device. Error control is implemented by the upper layer of the OSI module.
FR Frame Forwarding
On the network shown in Figure 1-555, the source device and receiver are connected through
a PVC passing through Device A, Device B, and Device C. Each router maintains an address
mapping table that records the mapping between the inbound and outbound interfaces. FR
frames are received from the inbound interface and sent by the outbound interface to the next
router. Transit devices can be configured and connected through VCs on the FR network.
Two devices across an FR network can be connected through a PVC consisting of multiple
VCs, (each VC is identified by a DLCI). Figure 1-555 shows how an FR frame is forwarded
along a PVC:
1. The source device sends an FR frame from port 1 along the VC specified by DLCI 1.
2. After receiving the FR frame from port 1, Device A sends it through port 2 along the VC
specified by DLCI 2.
3. After receiving the FR frame from port 0, Device B sends it through port 1 along the VC
specified by DLCI 3.
4. After receiving the FR frame from port 1, Device C sends it to the receiver through port
0 along the VC specified by DLCI 4.
1.8.4.2.4 FR Sub-interfaces
Background
An FR sub-interface is a logical interface configured on a physical interface. FR
sub-interfaces reduce the number of physical interfaces and deployment costs as well as the
impact of split horizon.
An FR network interconnects networks in different geographical locations using a star,
full-mesh, or partial-mesh network topology.
The star topology requires the least number of PVCs and is the most cost-effective. In the star
topology, PVCs are configured on an interface of the central node for communication with
different branch nodes. The star topology is an ideal option when a headquarters and its
branch offices need to be interconnected. The disadvantage of the star topology is that packets
exchanged between branch nodes have to pass through the central node.
In a full-mesh topology, each two nodes are connected using PVCs and exchange packets
directly. This topology ensures high transmission reliability because packets can be switched
to other PVCs if the direct PVC between two nodes fails. However, the full-mesh topology
suffers from the "N square" problem and requires a large number of PVCs.
In a partial-mesh topology, only some nodes have PVCs to other nodes. An FR network is of
the non-broadcast multiple access (NBMA) type by default; Unlike Ethernet networks, the FR
network does not support broadcast. A node on the FR network must duplicate its received
route and send the route to different nodes over each PVC.
To avoid loops, split horizon is deployed to prevent an interface from sending received
routing information.
On the network shown in Figure 1-556, Device B sends a route to a POS interface of Device
A. Due to split horizon, Device A cannot send the route to Device C or Device D through the
POS interface. To resolve this problem, any of the following solutions can be used:
Use multiple physical interfaces to connect two neighboring devices. This solution is not
cost-efficient because each device needs to provide multiple physical interfaces.
Configure multiple sub-interfaces on a physical interface. Then assign a network address
to each sub-interface so that they can function as multiple physical interfaces.
Disable split horizon. This solution increases the possibility of routing loops.
Implementation
FR can be deployed on interfaces or sub-interfaces, and multiple sub-interfaces can be
configured on one interface. Although sub-interfaces are logical, they have similar function as
interfaces at the network layer. Protocol addresses and VCs can be configured on the
sub-interfaces for communication with other devices.
On the network shown in Figure 1-557, three sub-interfaces (POS 3/0/1.1, POS 3/0/1.2, and
POS 3/0/1.3) are configured on a POS interface of Device A. Each sub-interface is connected
to a remote device through a VC. POS 1/0/0.1 is connected to Device B, POS 1/0/0.2 is
connected to Device C, and POS 1/0/0.3 is connected to Device D.
With the preceding configurations, the FR network is partially meshed. Devices can therefore
exchange update messages with each other, overcoming the limitations of split horizon.
Benefits
FR sub-interfaces reduce deployment costs.
1.8.4.3 Applications
1.8.4.3.1 FR Access
A typical FR application is FR access. FR access allows upper-layer packets to be transmitted
over an FR network.
An FR network allows user devices, such as routers and hosts, to exchange data.
1.8.4.4 Impact
1.8.4.4.1 On System Performance
None
Terms
Term Definition
X.25 A data link layer protocol that defines how to maintain connections
between DTE and DCE devices for remote terminal access and PC
communication on a PDN.
Sub-interface A logical interface configured on a physical interface to facilitate
service deployment.
Definition
As a bit-oriented link layer protocol, HDLC transparently transmits bit flows of any type
without specifying data as a set of characters.
Through the trunk technology, you can aggregate many physical interfaces into an
aggregation group to balance received and sent data among these interfaces and to provide
more highly-reliable connections.
HDLC
Compared with other data link layer protocols, HDLC has the following features:
Full-duplex communication, which can send data continuously without waiting for
acknowledgment and has high data transmission efficiency.
All frames adopt the Circle Redundancy Check (CRC) that numbers information frames.
In this way, the information frames can be prevented from being lost or received
repeatedly; therefore, the transmission reliability is improved.
Transmission control function is separated from process function. Therefore, HDLC has
high flexibility and excellent control function.
HDLC does not depend on any character set and can transmit data transparently.
Zero-Bit Insertion, which is used to perform transparent transmission, is easy to be
applied on hardware.
1.8.5.2 Principles
1.8.5.2.1 HDLC Principles
Background
Synchronous data link protocols include character-oriented, bit-oriented, and byte-oriented
protocols.
IBM put forward the first character-oriented synchronous protocol, called Binary
Synchronous Communication (BISYNC or BSC).
Later, ISO put forward related standards. The ISO standard is ISO 1745:1975 Information
processing - Basic mode control procedures for data communication systems.
In the early 1970s, IBM introduced the bit-oriented Synchronous Data Link Control (SDLC)
protocol.
Later, ANSI and ISO adopted and developed SDLC, and then later put forward their own
standards. ANSI introduced the Advanced Data Communications Control Protocol (ADCCP),
and ISO introduced HDLC.
HDLC Features
HDLC is a bit-oriented code-transparent synchronous data link layer protocol. It provides the
following features:
HDLC works in full-duplex mode and can transmit data continuously without waiting for
acknowledgement. Therefore, HDLC features high data link transmission efficiency.
HDLC uses cyclic redundancy check (CRC) for all frames and numbers them. This helps
you know which frames are dropped and which frames are repeatedly transmitted.
HDLC ensures high transmission reliability.
HDLC separates the transmission control function from the processing function and
features high flexibility and perfect control capabilities.
HDLC is independent of any character encoding set and transparently transmits data.
Zero-bit insertion, which is used for transparent data transmission, is easy to implement
on hardware.
HDLC is especially used to logically transmit data that is segmented into physical blocks or
packages. These blocks or packages are called frames, each of which is identified by a start
flag and an end flag. In HDLC, all bit-oriented data link control protocols use a unified frame
format, and both data and control information are transmitted in frames. Each frame begins at
and ends with a frame delimiter, which is a unique sequence of bits of 01111110. The frame
delimiter marks the start or end of a frame or marks for synchronization. The frame delimiter
is invisible inside a frame to avoid confusion.
Zero-bit insertion is used to ensure that the sequence of bits used for the flag does not appear
in normal data. On the transmit end, zero-bit insertion monitors all fields except the flag and
places a 0 after five consecutive 1s. On the receive end, zero-bit insertion also monitors all
fields except the flag. After five consecutive 1s are found, if the following bit is a 0, the 0 is
automatically deleted to restore the former bit flow. If the following bit is a 1, it means that an
error has occurred or an end delimit is received. In this case, the frame receive procedure is
generally either restarted or aborted.
Introduction
Nodes on a network running HDLC are called stations. HDLC specifies three types of stations:
primary, secondary, and combined.
A primary station is the controlling station on a link. It controls the secondary stations on the
link and manages data flow and error recovery.
A secondary station is present on a link where there is a primary station. The secondary
station is controlled by the primary station, and has no direct responsibility for controlling the
link. Under normal circumstances, a secondary station will transfer frames only when
requested to do so by the primary station, and will respond only to the primary station.
A combined station is a combination of primary and secondary stations.
Frames transferred by a primary station to a secondary station are called commands, and
frames transferred by a secondary station to a primary station are called responses.
On a point to multipoint (P2MP) link, there is a primary station and several secondary stations.
The primary station polls the secondary stations to determine whether they have data to
transmit, and then selects one to transmit its data. On a point to point (P2P) link, both ends
can be combined stations. If a node is connected to multiple links, the node can be the primary
station for some links and a secondary station for the other links.
A complete HDLC frame consists of several fields, such as the Flag field, Address field,
Control field, Information field, and Frame check sequence (FCS) field. Figure 1-559 shows
the format of a complete HDLC frame.
1.8.5.2.5 IP-Trunk
A trunk can aggregate many interfaces into an aggregation group to implement load balancing
on member interfaces. Therefore, link connectivity is of higher reliability. Trunk interfaces are
classified as Eth-Trunk interfaces and IP-Trunk interfaces. An IP-Trunk can only be composed
of POS links. It has the following characteristics:
Increased bandwidth: An IP-Trunk obtains the sum of bandwidths of all member
interfaces.
Improved reliability: When a link fails, traffic is automatically switched to other links,
which improves connection reliability.
Member interfaces of an IP-Trunk interface must be encapsulated with HDLC. IP-Trunk and
Eth-Trunk technologies have similar principles. For details, see the chapter about trunk in the
NE20E Feature Description - LAN Access and MAN Access.
Background
Due to unstable signals on physical links or incorrect configurations at the data link layer on
live networks, an interface on which High-Level Data Link Control (HDLC) is enabled may
frequently experience HDLC negotiation, and the HDLC protocol status of the interface may
alternate between Up and Down, causing routing protocol or MPLS flapping. As a result,
devices and networks are severely affected. Worse still, devices are paralyzed and networks
become unavailable.
HDLC flapping suppression restricts the frequency at which the HDLC protocol status of an
interface alternates between Up and Down. This restriction minimizes the impact of flapping
on devices and networks.
Implementation Principles
HDLC flapping suppression involves the following concepts:
Penalty value: This value is calculated based on the HDLC protocol status of the
interface using the suppression algorithm. The core of the suppression algorithm is that
the penalty value increases with the changing times of the interface status and decreases
exponentially.
Suppression threshold: The HDLC protocol status of an interface remains Down when
the penalty value is greater than the suppression threshold.
Reuse threshold: The HDLC protocol status of an interface is no longer suppressed when
the penalty value is smaller than the reuse threshold.
Ceiling threshold: The penalty value no longer increases when the penalty value reaches
the ceiling threshold, preventing the HDLC protocol status of an interface from being
suppressed for a long time. The ceiling value can be calculated using the following
formula: ceiling = reuse x 2(MaxSuppressTime/HalfLifeTime).
Half-life-period: period that the penalty value takes to decrease to half. A half-life-period
begins to elapse when the HDLC protocol status of an interface goes Down for the first
time. If the specific half life expires, the penalty value decreases by half. Once a half life
ends, another half life starts.
Max-suppress-time: maximum period during which the HDLC protocol status of an
interface is suppressed. After a max-suppress-time elapses, the HDLC protocol status of
the interface is renegotiated and reported.
Figure 1-560 shows the relationships between these parameters.
At t1, the HDLC protocol status of an interface goes Down, and its penalty value increases by
1000. Then, the interface goes Up, and its penalty value decreases exponentially based on the
half-life rule. At t2, the HDLC protocol status of the interface goes Down again, and its
penalty value increases by 1000, reaching 1600, which has exceeded the suppression
threshold of 1500. The HDLC protocol status of the interface is therefore suppressed. As the
interface keeps flapping, its penalty value keeps increasing until it reaches the ceiling
threshold of 10000 at tA. As time goes by, the penalty value decreases and reaches the reuse
value of 750 at tB. The HDLC protocol status of the interface is then no longer suppressed.
1.8.5.3 Applications
HDLC
IP-Trunk
For an IP-Trunk interface, you can configure weights for member interfaces to implement
load balancing among member interfaces. There are two load balancing modes, namely,
per-destination and per-packet load balancing.
Per-destination load balancing: packets with the same source and destination IP
addresses are transmitted over one member link.
Per-packet load balancing: packets are transmitted over different member links.
As shown in Figure 1-562, two routers are connected through POS interfaces that are bundled
into an IP-Trunk interface to transmit IPv4, IPv6, and MPLS packets.
Terms
Term Definition
Aggregation Two or more interfaces are bundled together so that they function as a
single interface for load balancing and link protection.
Inter-board Interfaces on different boards are bundled together to form a link
aggregation aggregation group to improve the reliability of the link aggregation group.
Bundling Two boards can be bundled together and considered as one board.
Load Member interfaces in a link aggregation group are determined as outbound
balancing interfaces for packets based on their source and destination MAC addresses.
Definition
The Point-to-Point Protocol (PPP) is a link-layer protocol used to transmit point-to-point (P2P)
data over full-duplex synchronous and asynchronous links.
PPP negotiation involves the following items:
Data encapsulation mode: defines how to encapsulate multi-protocol data packets.
Link Control Protocol (LCP): used to set up, monitor, and tear down data links.
Network Control Protocol (NCP): used to negotiate options for a network layer protocol
running atop PPP and the format and type of the data to be transmitted over data links.
PPP uses the Password Authentication Protocol (PAP) and Challenge Handshake
Authentication Protocol (CHAP) to secure network communication.
If carriers have high bandwidth requirements, bundle multiple PPP links into an MP link to
increase link bandwidth and improve link reliability.
Purpose
PPP, which works at the second layer (data link layer) of the open systems interconnection
(OSI) model, is mainly used on links that support full-duplex to transmit data. PPP is widely
used because it provides user authentication, supports synchronous and asynchronous
communication, and is easy to extend.
PPP is developed based on the Serial Line Internet Protocol (SLIP) and overcomes the
shortcomings of SLIP which supports transmits only IP packets, and does not support
negotiation. Compared with other link-layer protocols, PPP has the following advantages:
PPP supports both synchronous and asynchronous links, whereas SLIP supports only
asynchronous links, and other link-layer protocols, such as X.25, support only
synchronous links.
PPP is highly extensible.
PPP uses a Link Control Protocol (LCP) to negotiate link-layer parameters.
PPP uses a Network Control Protocol (NCP), such as the IP Control Protocol (IPCP) or
Internetwork Packet Exchange Control Protocol (IPXCP), to negotiate network-layer
parameters.
PPP supports Password Authentication Protocol (PAP) and Challenge Handshake
Authentication Protocol (CHAP) which improve network security.
PPP does not have a retransmission mechanism, which reduces network costs and speeds
up packet transmission.
1.8.6.2 Principles
1.8.6.2.1 PPP Basic Concepts
PPP Architecture
PPP works at the network access layer of the Transmission Control Protocol (TCP)/IP suite
for point-to-point (P2P) data transmission over full-duplex synchronous and asynchronous
links.
Information field
The Information field contains the data. The maximum length of the Information field,
including the Padding content, is equivalent to the maximum receive unit (MRU) length.
The MRU defaults to 1500 bytes and can be negotiated.
In the Information field, the Padding content is optional. If data is padded, the
communicating devices can communicate only when they can identify the padding
information as well as the payload to be transmitted.
Frame check sequence (FCS) field
The FCS field checks whether PPP packets contain errors.
Some mechanisms used to ensure proper data transmission increase the transmission cost
and cause delay in data exchange at the application layer.
Identifier field
The Identifier field is 1 byte long. It is used to match requests and replies. If a packet
with an invalid Identifier field is received, the packet is discarded.
The sequence number of a Configure-Request packet usually starts at 0x01 and increases
by 1 each time the Configure-Request packet is sent. After a receiver receives a
Configure-Request packet, it must send a reply packet with the same sequence number as
the received Configure-Request packet.
Length field
The Length field specifies the length of a negotiation packet, including the length of the
Code, Identifier, Length, and Data fields.
The Length field value cannot exceed the MRU of the link. Bytes outside the range of
the Length field are treated as padding and are ignored after they are received.
Data field
The Data field contains the contents of a negotiation packet and includes the following
fields:
− Type field: specifies the negotiation option type.
− Length field: specifies the total length of the Data field.
− Data field: contains the contents of the negotiation option.
0x01 Maximum-Receive-Unit
0x02 Async-Control-Character-Map
0x03 Authentication-Protocol
0x04 Quality-Protocol
0x05 Magic-Number
0x06 RESERVED
0x07 Protocol-Field-Compression
0x08 Address-and-Control-Field-Compression
1. Two devices enter the Establish phase if one of them sends a PPP connection request to
the other.
2. In the Establish phase, the two devices perform an LCP negotiation to negotiate the
working mode, maximum receive unit (MRU), authentication mode, and magic number.
The working mode can be either Single-Link PPP (SP) or Multilink PPP (MP). If the
LCP negotiation succeeds, LCP enters the Opened state, which indicates that a
lower-layer link has been established.
3. If authentication is configured, the two devices enter the Authentication phase and
perform Password Authentication Protocol (PAP) or Challenge Handshake
Authentication Protocol (CHAP) authentication. If no authentication is configured, the
two devices enter the Network phase.
4. In the Authentication phase, if PAP or CHAP authentication fails, the two devices enter
the Terminate phase. The link is torn down and LCP enters the Down state. If PAP or
CHAP authentication succeeds, the two devices enter the Network phase, and LCP
remains in the Opened state.
5. In the Network phase, the two devices perform an NCP negotiation to select a
network-layer protocol and to negotiate network-layer parameters. After the two devices
succeed in negotiating a network-layer protocol, packets can be sent over this PPP link
using the network-layer protocol.
Various control protocols, such as IP Control Protocol (IPCP) and Multiprotocol Label
Switching Control Protocol (MPLSCP), can be used in NCP negotiation. IPCP mainly
negotiates the IP addresses of the two devices.
6. If the PPP connection is interrupted during PPP operation, for example, if the physical
link is disconnected, the authentication fails, the negotiation timer expires, or the
connection is torn down by the network administrator, the two devices enter the
Termination phase.
7. In the Termination phase, the two devices release all resources and enter the Dead phase.
The two devices remain in the Dead phase until a new PPP connection is established
between them.
Dead Phase
The physical layer is unavailable during the Dead phase. A PPP link begins and ends with this
phase.
When two devices detect that the physical link between them has been activated, for example,
when carrier signals are detected on the physical link, the two devices move from the Dead
phase to the Establish phase.
After the PPP link is terminated, the two devices enter the Dead phase.
Establish Phase
In the Establish phase, the two devices perform an LCP negotiation to negotiate the working
mode (SP or MP), MRU, authentication mode, and magic number. After the LCP negotiation
is complete, the two devices enter the next phase.
In the Establish phase, the LCP status changes as follows:
If the link is unavailable (in the Dead phase), LCP is in the Initial or Starting state. When
the physical layer detects that the link is available, the physical layer sends an Up event
to the link layer. Upon receipt, the link layer changes the LCP status to Request-Sent.
Then, the devices at both ends send Configure-Request packets to each other to
configure a data link.
If the local device first receives a Configure-Ack packet from the peer, the LCP status
changes from Request-Sent to Ack-Received. After the local device sends a
Configure-Ack packet to the peer, the LCP status changes from Ack-Received to Open.
If the local device first sends a Configure-Ack packet to the peer, the LCP status changes
from Request-Sent to Ack-Sent. After the local device receives a Configure-Ack packet
from the peer, the LCP status changes from Ack-Sent to Open.
After LCP enters the Open state, the next phase starts.
The next phase is the Authentication or Network phase, depending on whether authentication
is required.
Authentication Phase
The Authentication phase is optional. By default, PPP does not perform authentication during
PPP link establishment. If authentication is required, the authentication protocol must be
specified in the Establish phase.
PPP provides two password authentication modes: PAP authentication and CHAP
authentication.
Two authentication methods are available: unidirectional authentication and bidirectional authentication.
In unidirectional authentication, the device on one end functions as the authenticating device, and the
device on the other end functions as the authenticated device. In bidirectional authentication, each
device functions as both the authenticating and authenticated device. In practice, only unidirectional
authentication is used.
1. The authenticated device sends the local user name and password to the authenticating
device.
2. The authenticating device checks whether the received user name is in the local user list.
− If the received user name is in the local user list, the authenticating device checks
whether the received password is correct.
If the password is correct, the authentication succeeds.
If the password is incorrect, the authentication fails.
− If the received user name is not in the local user list, the authentication fails.
PAP Packet Format
A PAP packet is encapsulated into the Information field of a PPP packet with the Protocol
field value 0xC023. Figure 1-567 shows the PAP packet format.
In PAP authentication, passwords are sent over links in simple text. After a PPP link is established,
the authenticated device repeatedly sends the user name and password until authentication finishes.
PAP authentication is used on networks that do not require high security.
CHAP is a three-way handshake authentication protocol. In CHAP authentication, the authenticated
device sends only a user name to the authenticating device. Compared with PAP, CHAP features
higher security because passwords are not transmitted. CHAP authentication is used on networks
that require high security.
Network Phase
In the Network phase, NCP negotiation is performed to select a network-layer protocol and to
negotiate network-layer parameters. An NCP can enter the Open or Closed state at any time.
After an NCP enters the Open state, network-layer data can be transmitted over the PPP link.
Termination Phase
PPP can terminate a link at any time. A link can be terminated manually by an administrator
or be terminated due to carrier loss, an authentication failure, or other causes.
Background
When two devices are connected through the interfaces over an intermediate transmission
device, their connection will be adjusted if the connection is found incorrect during traffic
transmission. However, the interfaces cannot detect the connection adjustment because the
interfaces do not go Down, and therefore LCP renegotiation is not triggered. However, PPP
allows the interfaces to learn the 32-bit host routes from each other only during the LCP
negotiation. As a result, the interfaces continue to transmit traffic using the host routes learned
during the original connection even after the connection change, and traffic is transmitted
incorrectly.
To address this issue, deploy PPP magic number check on these devices. Even if the interfaces
do not detect the connection change, PPP magic number check can trigger LCP renegotiation.
The interfaces then re-learn the host routes from each other.
Principles
Magic numbers are generated by communication devices independently. To prevent devices
from generating identical magic numbers, each device generates a unique magic number using
its serial number, hardware address, or clock randomly.
Devices negotiate their magic numbers during LCP negotiation and send Echo packets
carrying their negotiated magic numbers to their peers after the LCP negotiation.
In Figure 1-570, Device A and Device B are connected over a transmission device, and
Device C and Device D are also connected over this transmission device. PPP connections
have been established, and LCP negotiation is complete between Device A and Device B and
between Device C and Device D. If the connections are found incorrect, an adjustment is
required to establish a PPP connection between Device A and Device C. In this situation, PPP
magic number check can be used to trigger the LCP renegotiation as follows:
1. Device A sends to Device C an Echo-Request packet carrying Device A's negotiated
magic number.
2. When receiving the Echo-Request packet, Device C compares the magic number carried
in the packet with its peer's negotiated magic number (Device D's). The magic numbers
are different, and the error counter on Device C increases by one.
3. Device C replies to Device A with an Echo-Reply packet carrying Device D's negotiated
magic number.
4. When receiving the Echo-Reply packet, Device A compares the magic number carried in
the packet with the local magic number. The magic numbers are different. Device A then
compares the magic number in the packet with its peer's negotiated magic number
(Device B's). The magic numbers are also different, and the error counter on Device A
increases by one.
5. The preceding steps are repeated. If the error counter reaches a specified value, LCP
goes Down, and LCP renegotiation is triggered.
Figure 1-570 shows the connection status before LCP renegotiation. Device A and Device C still use the
local and peer's magic numbers that are negotiated previously. These magic numbers are not updated
until the LCP renegotiation.
Background
Due to unstable signals on physical links or incorrect configurations at the data link layer on
live networks, PPP-capable interfaces may frequently experience PPP negotiation, and the
PPP protocol status of these interfaces may alternate between Up and Down, causing routing
protocol or MPLS flapping. As a result, devices and networks are severely affected. Worse
still, devices are paralyzed and the network become unavailable.
PPP flapping suppression restricts the frequency at which the PPP protocol status of an
interface alternates between Up and Down. This restriction minimizes the impact of flapping
on devices and networks.
Implementation Principles
PPP flapping suppression involves the following concepts:
Penalty value: This value is calculated based on the PPP protocol status of the interface
using the suppression algorithm. The core of the suppression algorithm is that the penalty
value increases with the changing times of the interface status and decreases
exponentially.
Suppression threshold: The PPP protocol status of an interface is suppressed and remains
Down when the penalty value is greater than the suppression threshold.
Reuse threshold: The PPP protocol status of an interface is no longer suppressed when
the penalty value is smaller than the reuse threshold.
Ceiling threshold: The penalty value no longer increases when the penalty value reaches
the ceiling threshold, preventing the PPP protocol status of an interface from being
suppressed for a long time. The ceiling value can be calculated using the following
formula: ceiling = reuse x 2(MaxSuppressTime/HalfLifeTime).
Half-life-period: period that the penalty value takes to decrease to half. A half-life-period
begins to elapse when the PPP protocol status of an interface goes Down for the first
time. If a half-life-period elapses, the penalty value decreases to half, and another
half-life-period begins.
Max-suppress-time: maximum period during which the PPP protocol status of an
interface is suppressed. After a max-suppress-time elapses, the PPP protocol status of the
interface is renegotiated and reported.
Figure 1-571 shows the relationships between these parameters.
At t1, the PPP protocol status of an interface goes Down, and its penalty value increases by
1000. Then, the interface goes Up, and its penalty value decreases exponentially based on the
half-life rule. At t2, the PPP protocol status of the interface goes Down again, and its penalty
value increases by 1000, reaching 1600, which has exceeded the suppression threshold of
1500. The PPP protocol status of the interface is therefore suppressed. As the interface keeps
flapping, its penalty value keeps increasing until it reaches the ceiling threshold of 10000 at
tA. As time goes by, the penalty value decreases and reaches the reuse value of 750 at tB. The
PPP protocol status of the interface is then no longer suppressed.
1.8.6.2.5 MP Principles
Principles
The Multilink protocol bundles multiple PPP links into an MP link to increase link bandwidth
and reliability. MP fragments packets exceeding the maximum transmission unit (MTU) and
sends these fragments to the PPP peer over the PPP links in the MP-group. The PPP peer then
reassembles these fragments into packets and forwards these packets to the network layer. For
packets that do not exceed the MTU, MP directly sends these packets over the PPP links in
the MP-group to the PPP peer, which in turn forwards these packets to the network layer.
Implementation
An MP-group interface is dedicated to MP applications. MP is implemented by adding
multiple interfaces to an MP-group interface.
MP negotiation involves:
LCP negotiation: Devices on both ends negotiate LCP parameters and check whether
they both work in MP mode. If they work in different working modes, LCP negotiation
fails.
Network Control Protocol (NCP) negotiation: Devices on both ends perform NCP
negotiation by using only NCP parameters (such as IP addresses) of the MP-group
interfaces but not using the NCP parameters of physical interfaces.
If NCP negotiation succeeds, an MP link is established.
Benefits
MP provides the following benefits:
Increased bandwidth
Load balancing
Link backup
Reduced delay through packet fragmentation
1.8.6.3 Applications
1.8.6.3.1 MP Applications
A single PPP link can provide only limited bandwidth. To increase link bandwidth and
reliability, bundle multiple PPP links into a MP link.
As shown in Figure 1-572, there are two PPP links between Device A and Device B. The two
PPP links are bundled into an MP link by creating an MP-group interface. The MP link
provides higher bandwidth than a single PPP link. If one PPP link in the MP group fails,
communication over the other PPP link is not affected.
Terms
None
Definition
Carrier-class networks require high reliability for IP devices. IP devices are required to rapidly
detect faults.
When the fast detection function is enabled on an interface, alarm reporting becomes faster.
This may cause the physical status of the interface to switch between Up and Down. As a
result, the network flaps frequently.
Therefore, alarms must be filtered and suppressed to prevent frequent network flapping.
Transmission alarm suppression can efficiently filter and suppress alarm signals to prevent
interfaces from frequently flapping. In addition, transmission alarm customization can control
the impact of alarms on the interface status.
Transmission alarm customization and suppression provide the following functions:
The Transmission alarm customization function allows you to specify alarms that can
cause the physical status of an interface to change. This function helps filter out
unwanted alarms.
The Transmission alarm suppression function allows you to suppress network flapping
by setting a series of thresholds.
Purpose
Transmission alarm customization allows you to filter unwanted alarms, and transmission
alarm suppression enables you to set thresholds on customized alarms, allowing devices to
ignore burrs generated during transmission link protection and preventing frequent network
flapping.
On a backbone network or an MAN, IP devices are connected to transmission devices,
including synchronous digital hierarchy (SDH), Synchronous Optical Network (SONET).
When transmission devices become faulty, IP devices will receive alarms. Then, faulty
transmission devices perform link switchovers and the alarms disappear. After an alarm is
generated, a link switchover lasts 50 ms to 200 ms. In the log information on IP devices, the
transmission alarms are displayed as burrs that last 50 ms to 200 ms. These burrs will cause
the interface status of IP devices to switch frequently. IP devices will perform route
calculation frequently. As a result, routes flap frequently, affecting the performance of IP
devices.
From the perspective of the entire network, IP devices are expected to ignore such burrs. That
is, IP devices must customize and suppress the alarms that are generated during transmission
device maintenance or link switchovers. This can prevent route flapping. Transmission alarm
customization can control the impact of transmission alarms on the physical status of
interfaces. Transmission alarm suppression can efficiently filter and suppress specific alarm
signals to avoid frequent interface flapping.
1.8.7.2 Principles
1.8.7.2.1 Basic Concepts
Network Flapping
Network flapping occurs when the physical status of interfaces on a network frequently
alternates between Up and Down.
Alarm Burrs
An alarm burr is a process in which alarm generation and alarm clearance signals are received
in a short period (The period varies with specific usage scenarios, devices, or service types).
For example, if a loss of signal (LOS) alarm is cleared 50 ms after it is generated, the process
from the alarm generation to clearance is an alarm burr.
Alarm Flapping
Alarm flapping is a process in which an alarm is repeatedly generated and cleared in a short
period (The period varies with specific usage scenarios, devices, or service types).
For example, if an LOS alarm is generated and cleared 10 times in 1s, alarm flapping occurs.
suppress: alarm suppression threshold. When the figure of merit value exceeds this
threshold, alarms are suppressed. This value must be smaller than the ceiling value and
greater than the reuse value.
ceiling: maximum value of figure of merit. When an alarm is repeatedly generated and
cleared in a short period, figure of merit significantly increases and, therefore, takes a
long time to return to reuse. To avoid long delays returning to reuse, a ceiling value can
be set to limit the maximum value of figure of merit. figure of merit does not increase
when it reaches the ceiling value.
reuse: alarm reuse threshold. When this value is greater than that of figure of merit,
alarms are not suppressed. This value must be smaller than the suppress value.
half-time: time used by figure of merit of suppressed alarms to decrease to half.
decay-ok: time used by figure of merit to decrease to half when an alarm clearance
signal is received.
decay-ng: time used by figure of merit to decrease to half when an alarm generation
signal is received.
Figure 1-573 shows the correlation between a transmission device sending alarm generation
signals and how figure of merit increases and decreases.
1. At t1 and t2, figure of merit is smaller than suppress. Therefore, alarm signals
generated at t1 and t2 affect the physical status of the interface, and the physical status of
the interface changes to Down.
2. At t3, figure of merit exceeds suppress, and the alarm is suppressed. The physical
status of the interface is not affected, even if new alarm signals arrive.
3. At t4, figure of merit reaches ceiling. If new alarm signals arrive, figure of merit is
recalculated but does not exceed ceiling.
4. At t5, figure of merit falls below reuse, and the alarm is free from suppression.
Terms
None
Definition
The circuit emulation service (CES) technology carries traditional TDM data over a packet
switched network (PSN) and provides end-to-end PDH and SDH data transmission in the
PWE3 architecture.
The pseudo random binary sequence (PRBS) is used to generate random data.
CES service connectivity tests use the PRBS technique to generate a PRBS stream,
encapsulate the PRBS stream into CES packets, send and receive the CES packets over CES
service channels, and calculate the proportion of error bits to the total number of bits to obtain
the bit error rate (BER) of CES service channels for measuring service connectivity.
Purpose
When routers and access services such as ATN devices are connected over a public network,
the transmission quality affects service deployment and cutover. To address this problem, use
the NMS to deliver a service connectivity test command after CES services are deployed on
PWs. After the test is conducted, the device returns the test result to the NMS. This shortens
service deployment.
Benefits
CES service connectivity tests offer the following benefits to carriers:
Monitors link quality during network cutover and helps identify potential risks,
improving the cutover success ratio and minimizing user complaints about operator
network issues.
Helps speed up service deployment and cutover on a network, shortening the service
launch period.
1.8.8.2 Principles
1.8.8.2.1 Basic Principles
PRBS Stream
CES service connectivity tests use the PRBS technique to generate a PRBS stream,
encapsulate the PRBS stream into CES packets, send and receive the CES packets over CES
service channels, and calculate the proportion of error bits to the total number of bits to obtain
the BER of CES service channels for measuring service connectivity.
A PRBS stream is a pseudo random binary sequence of bits.
1. PRBS stream generation: A PRBS stream is generated by a specific carry flip-flop using
a multinomial. The multinomial varies according to the length of a sequence.
2. PRBS stream measurement Figure 1-574 shows how PRBS stream measurement is
implemented. After the PRBS module of PE1 generates a PRBS stream, the PRBS
stream is encapsulated to CES packets, which are then sent by the network-side
high-speed TX interface to PE2 over a PW. Upon receipt, PE2's line-side E1 interface
performs a local loopback and sends the CES packets through the network-side interface
to PE1's RX interface. After PE1 receives the packets, it compares the sent and received
data and counts the error bits.
3. Bit error insertion during tests: During the tests, bit errors can be inserted to the PRBS
stream. PE1 generates a PRBS stream and inserts bit errors. After the PRBS receive unit
receives bit errors, PE1 can determine the test validity.
4. Test termination by PRBS streams: If a CES service connectivity test lasts for a long
time, you can stop sending and receiving the PRBS stream to terminate the test.
CES service connectivity tests are offline detections and interrupt services. Therefore, this function
applies to site deployment and fault detection after a service interruption.
BER Calculation
The BER is calculated using the following equation:
BER = Number of error bits/(Interface rate x Test period)
1.8.8.3 Applications
1.8.9 CES
1.8.9.1 Introduction
Definition
TDM
Time Division Multiplex (TDM) is implemented by dividing a channel by time,
sampling voice signals, and enabling sampled voice signals to occupy a fixed interval
that is called timeslot according to time sequence. In this way, multiple ways of signals,
through TDM, can be combined into one way of high-rate complex digital signal (group
signal) in a certain structure. Each way of signal is transmitted independently.
TDMoPSN
Based on TDM circuits on a PSN, TDM Circuits over Packet Switching Networks
(TDMoPSN) is a kind of PWE3 service emulation. TDMoPSN emulates TDM services
over a PSN such as an MPLS or Ethernet network; therefore, transparently transmitting
TDM services over a PSN. TDMoPSN is mainly implemented by means of two
protocols: Structure-Agnostic TDM over Packet (SAToP) and Structure-Aware TDM
Circuit Emulation Service over Packet Switched Network (CESoPSN).
IP RAN
IP RAN, mobile carrier, is a technology used to carry wireless services over the IP
network. IP RAN scenarios are complex because different base stations (BSs), interface
technologies, access and convergence scenarios are involved.
− 2G/2.5G/3G/LTE, traditional BSs/IP BSs, GSM/CDMA, TDM/ATM/IP (interface
technologies) are involved.
− Varying with the BS type, distribution model, network environment, and evolution
process, the convergence modes include microwave, MSTP, DSL, PON, and Fiber.
You can converge services on BSs directly to the MAN UPE or through
convergence gateways (with functions of BS convergence, compression
optimization, packet gateway, and offload).
− Reliability, security, QoS and operation and maintenance (OM) are considered in IP
RAN scenarios. In some IP RAN scenarios, transmission efficiency is concerned.
CEP
Circuit Emulation over Packet (CEP) emulates Synchronous Optical Network
(SONET)/Synchronous Digital Hierarchy (SDH) circuits and services over MPLS. The
emulation signals include:
Purpose
TDMoPSN is just a mature solution of this kind. TDMoPSN is applied to implement
accessing and bearing of TDM services on the PSN. TDMoPSN is mainly applied to IP RAN
carrying wireless services to carry fixed network services between MSAN devices.
Benefits
The TDMoPSN feature offers the following benefits to carriers:
Saves rent for expensive TDM leased lines.
Facilitates smooth evolution of the network.
Simplifies network operations and reduces maintenance cost.
Binds only the useful time slots into packets to improve the resource utilization.
The TDMoPSN feature offers the following benefits to users:
Be free from paying expensive rent for leased lines for fixed network operators when an
enterprise access the network for the voice service.
1.8.9.2 Principles
1.8.9.2.1 Basic Concepts
TDMoPSN
A TDMoPSN packet, as defined by Recommendation rfc4553-Structure-Agnostic Time
Division Multiplexing, includes the Ethernet header, TDMoPSN packet (CES or SAToP
packet), and FCS.
The CPOS signal is divided into 63 E1 signals by the Framer on the CPOS interface. Then, E1
signals are encapsulated according to the protocol. Packets received on the network side are
decapsulated to E1 signals, bound into CPOS signals by the Framer, and then sent to the
CPOS line. Therefore, the implementation of SDH data services in TDMoPSN is similar with
the implementation of PDH data services in TDMoPSN.
SAToP
The Structure-Agnostic TDM over Packet (SAToP) function emulates PDH circuit
services of low rate.
SAToP is used to carry E1/T1/E3 services in unframed mode (non-structured). It divides
and encapsulates serial data streams of TDM services, and then transmits encapsulated
packets in a PW. SAToP is the most simple method to handle transparent transmission of
PDH low-rate services in TDM circuit simulation schemes.
Clock synchronization
TDMoPSN service packets are transmitted at a constant rate. The local and remote
devices must have synchronized clocks before exchanging TDMoPSN service packets.
Traditional TDM services can synchronize clocks through a physical link but TDMoPSN
services are carried on a PSN. TDM services lose synchronization clock signals when
reaching a downstream PE.
A downstream PE uses either of the following methods to synchronize clocks:
− Obtains clock signals from an external BITS clock.
− Recovers clock signals from packets.
Downstream PEs, by following an algorithm, can extract clock signals from
received PWE3 packets. Clock recovery is further classified as adaptive clock
recovery (ACR) and differential clock recovery (DCR) according to
implementation.
QoS processing
TDM services require low delay and jitter and fixed bandwidth. A high QoS priority
must be specified for TDM services.
CES implementation
CESoPSN services are encapsulated through MPLS, with the structure defined by
Recommendation draft-ietf-pwe3-cesopsn-07 as shown in Figure 1-581.
MPLS Lable
The specified PSN header includes data required for forwarding packets from the PSN
border gateway to the TDM border gateway.
PWs are distinguished by PW tags that are carried on the specified layer of the PSN.
Since TDM is bidirectional, two PWs in reverse directions should be correlated.
PW Control Word
The structure of the CESoPSN control word is defined by Recommendation
draft-ietf-pwe3-cesopsn-07 as shown in Figure 1-582.
− Length (6 bits): length of a TDMoPSN packet (control word and payload) when the
padding bit is used to meet the requirements on the minimum transmission unit on
the PSN. When the length of the TDMoPSN packet is longer than 64 bytes, padding
bit field is padded with all 0s.
− Sequence number (16 bits): It is used for PW sequencing and enabling the detection
of discarded and disordered packets. The length of the sequence number is 16 bits
and has unsigned circular space. The initial value of the sequence number is
random.
Optional RTP
An RTP header can carry timestamp information to a remote device to support packet
recovery clock such as DCR. The packet recovery clock is not discussed in this
document. In addition, packets transmitted on some devices must include the RTP header.
To save bandwidth, no RTP header is recommended under other situations.
The RTP header is not configured by default. You can add it to packets. Configurations
of PEs on both sides must be the same; otherwise, two PEs cannot communicate with
each other.
The padding method for the RTP header on the NE20E is to keep the sequence number
(16 bits) consistent with the PW control word and pad other bits with 0s.
TDM Payload
The length of TDM payload is the number of encapsulated frames multiplied by the
number of timeslots bound to PW (bytes). When the length of the whole PW packet is
shorter than 64 bytes, fixed bit fields are padded to meet requirements of Ethernet
transmission.
SAToP implementation
CESoPSN services are encapsulated through MPLS, with the structure defined by
Recommendation rfc4553-Structure-Agnostic Time Division Multiplexing as show in Figure
1-584
MPLS Lable
The MPLS label for SAToP is the same as the MPLS label for CESoPSN.
PW Control Word
The structure of the CESoPSN control word is defined by Recommendation
RFC4553-Structure-Agnostic Time Division Multiplexing as show in Figure 1-585.
The optional RTP for SAToP is the same as the optional RTP for CESoPSN.
TDM Payload
The length of TDM payload is the number of encapsulated frames multiplied by 32
(bytes). When the length of the whole PW packet is shorter than 64 bytes, the fixed bits
are padded to meet requirements of Ethernet transmission.
Implementation Procedures
The frequency of E1 frames is 8000 frames/second, namely, 32 bytes/frame. An E1 frame
consists of 32 timeslots and each timeslot corresponds to one byte of 32 bytes. For example,
in CESoPSN mode, timeslot 0 (the byte 0 of 32 bytes) as the frame header, cannot carry data
but is used for special processing. The other 31 timeslots correspond to bytes 1 to 31 of each
E1 frame. In SAToP mode, no frame header is used and an E1 frame consists of 32 bytes.
As shown in Figure 1-586, the following implementation procedures goes from CE1, PE1,
PE2, to CE2. In the direction of TDM transparent transmission from CE1 to PE1, in
CESoPSN mode, PE1 encapsulates bytes 1 to 31 (payload) of the E1 frame received from
CE1 in a PW packet. In SAToP mode, PE1 encapsulates 256 bits as payload from the bit
stream in the form of 32 x 8 = 256bit in a PW packet. The frequency of E1 frames is fixed,
and therefore PE1 receives data (31 bytes or 256 bits) of a fixed frequency from CE1 and then
encapsulates data in the PW packet continuously. When the number of encapsulated frames
reaches the pre-configured number, the whole PW packet is sent to the PSN.
In the encapsulation structure of a PW packet, the control word is mandatory. The L bit, R bit,
and sequence number domain must be paid attention to. The L bit and R bit are used to carry
alarm information. They are used when the TDM transparent transmission process transmits
E1 frame data received by PE1 in a PW to an E1 interface of PE2 and PE1 needs to transmit
alarm information (such as AIS and RDI) from CE1 to a remote device. PE1 reports received
alarm information (AIS/RDI) to the control plane. The control plane modifies the L bit and R
bit in the control word of the PW packet and then sends them with E1 frame data to PE2.
The sequence number is used to prevent PW packets from being discarded or disordered
during forwarding on the PSN. Every time a PW packet is sent by PE1, the sequence number
increases by 1.
The downstream traffic goes from PE2 to CE2. After receiving a PW packet from the PSN,
PE2 caches the PW packet in different buffers by the mask included in the sequence number.
For example, the sequence number is 16 bits and 256 buffers are configured for caching, and
therefore the lowest 8 bits of the 16-bit sequence number is cached according to the map
address. When the sequence number of received PW packet is sequential and the configured
jitter buffer for the PW packet reaches the threshold, the PW packet is unpacked and then sent.
For example, 8 frames are encapsulated in a packet. According to the frequency of 8000
frames/second, 8 frames require 1 ms; however, the jitter buffer is configured to 3 ms.
Therefore, PW packets are not sent until its total number reaches 3.
If the PW packet corresponding to a sequence number is not received, an idle code (its
payload is configurable) is sent.
Before the PW packet is resolved and the sequence number is processed, the L bit and R bit
need to be processed. The L bit and R bit that carry alarm information is sent to PE2. After
being extracted with payload, the PW packet is sent to CE2 at the same frequency as that of
CE1 in the way that 31 bytes or 256 bits are included in a frame; otherwise, PE2 overruns or
underruns. Therefore, clock synchronization (frequency synchronization) is required between
the CE1 lock and PE2 clock in TDM transparent transmission.
The recommended mode for frequency synchronization in TDM transparent transmission is
ACR/DCR, that is, PE2 calculates the sending clock frequency of CE1 according to the
frequency of received PW packets and then uses the sending clock frequency of PE2 on the
AC side to send E1 frame data.
As shown in Figure 1-586, it is assumed that data is transmitted from CE2 to CE1. Alarm
transparent transmission is the process of transmitting E1/T1 alarms on PE1 to downstream
PE2 through the PW control word, restoring E1/T1 alarms, and then transmitting them to CE2,
and vice versa.
The types of alarms that can be transparently transmitted are AIS and RDI. Involved PW
control words are the L bit, R bit, and M bit.
Other Features
The TDM interface can be created on either a common E1 interface or an E1 interface that is
channelized from CPOS.
Both the non-slotted TDM interface (SAToP transparent transmission) and the slotted TDM
interface (CES transparent transmission) can be created.
The serial port supports encapsulation of packets through multiple protocols such as TDM,
ATM, PPP, and HDLC.
The dynamic or static PW protocol is supported.
MPLS Label
The specified PSN header includes data required to forward packets from a PSN border
gateway to a TDM border gateway.
PWs are distinguished by MPLS labels that are carried on a specified PSN layer. To
transmit bidirectional TDM services, two PWs that transmit in opposite directions are
associated.
CEP Header
Figure 1-588 shows the CEP header format .
The sequence number (16 bits) in the RTP header is padded in the same way as that in
the CEP header. The other bits in the RTP header are 0s.
TDM Payload
The TDM packet payload can only be 783 bytes.
Implementation
Each STM-1 frame consists of 9 rows and 270 columns. VC-4 occupies 9 rows and 261
columns, a total of 2349 bytes. As a CEP payload is 783-bytes long, one VC-4 can be broken
into three CEP packets.
In the following example, CEP packets are transmitted along the path CE1 -> PE1 -> PE2 ->
CE2. On the uplink of TDM transparent transmission from CE1 to PE1, PE1 fragments the
VC-4 contained in an SDN frame sent by CE1 into 783-byte payloads and encapsulates the
payloads into a PW packet. The frequency of SDH frames is fixed, and therefore PE1 receives
data at a fixed frequency from CE1 and then encapsulates data into the PW packet
continuously. When the number of encapsulated frames reaches the pre-configured number,
the whole PW packet is sent to the PSN.
In the encapsulation structure of a PW packet, the CEP header is mandatory. The L bit and R
bit are used to carry alarm information. PE1 transmits its received SDH frame data to an SDH
interface of PE2 over a PW on the PSN and transmits alarm information (such as AIS and
RDI) received from CE1 to a remote device. PE1 reports received alarm information
(LOS/LOF/AUAIS/MSAIS/AULOP) to the control plane. The control plane modifies the L
bit and R bit in the control word of the PW packet and then sends them with SDH frame data
to PE2.
The sequence number is used to prevent PW packets from being forward in the wrong
sequence (and therefore discarded) during forwarding on the PSN. Each time PE1 sends a PW
packet, the sequence number increases by 1.
On the downlink of TDM transparent transmission from PE2 to CE2, upon receipt of a PW
packet from the PSN, PE2 caches the PW packet in different buffers by the mask included in
the sequence number. For example, if the sequence number is 16 bits and 256 buffers are
configured for caching, the lowest 8 bits of the 16-bit sequence number are cached according
to the map address. When the sequence number of the received PW packet is sequential and
the configured jitter buffer for the PW packet reaches the threshold, the PW packet is
unpacked and then sent.
If the PW packet corresponding to a sequence number is not received, an idle code (its
payload is configurable) is sent.
The L and R bits need to be processed before the PW packet is parsed and the sequence
number is processed. The L and R bits that carry alarm information are sent to PE2. After
being extracted from the PW packet, these payloads are assembled into a VC-4 and integrated
into an SDH frame. The SDH frame is then sent to CE2 at the same frequency as that when
the SDH frame is sent by CE1. Otherwise, PE2 overruns or underruns. Therefore, clock
synchronization (frequency synchronization) is required between the CE1 clock and PE2
clock in TDM transparent transmission.
The types of alarms that can be transparently transmitted are LOS, LOF, AUAIS, MSAIS, and
AULOP.
Involved PW control words are the L bit and R bit.
1.8.9.3 Applications
Applicable Scenario 1
Scenario description
After TDM services from 2G base stations are converged on the E1 or T1 interface on PE1,
TDM packets are encapsulated into PSN packets that can be transmitted on PSNs. After
reaching downstream PE2, PSN packets are decapsulated to original TDM packets and then
the TDM packets are sent to the 2G convergence device.
Advantages of the solution
In the solution, multiple types of services are converged at a PE on the PSN. The solution
effectively saves original network resources, uses less PDH VLLs, and facilitates the
deployment of sites and the maintenance and administration of multiple services.
Application Scenario 2
Scenario description
TDM services of different office areas, residential areas, schools, enterprises, and institutions
can be accessed by a local PE through E1/T1 links. Heavy TDM services can be carried
through CPOS interfaces.
Advantages of the solution
The solution saves the rent for VLL because TDM services for enterprises are access by a
local PE. In addition, the solution can choose access types flexibly and plan networking
properly.
Applicable Scenario 3
Scenario description
In this solution, a network can carry 2G, 3G, and fixed network services concurrently. This
solution physically integrates the transmission of different types of services but keeps the
management of them independently. Therefore, it provides different service bearer solutions
for different operators on the same network.
Advantages of the solution
In the solution, different services can be carried on the same network and therefore the
resource utilization is improved and maintenance cost is reduced.
Applicable Scenario 4
Scenario description
Services of different timeslots on different sites can be accessed by the PSN through local E1.
The PE on the convergence side binds different timeslots of different E1s to one E1 and then
encapsulates bound timeslots and other CE1/E1 services as SDH data, and finally sends
encapsulated packets to the base station controller (BSC) through the CPOS interface.
Advantages of the solution
The solution channelizes E1 services, transparently transmits E1 services, multiplexes
timeslots of multiple E1s to one E1, and manages services of multiple E1s/CE1s through the
same CPOS interface.
Terms
None
1.9 IP Services
1.9.1 About This Document
Purpose
This document describes the IP services feature in terms of its overview, principles, and
applications.
Related Version
The following table lists the product version related to this document.
Intended Audience
This document is intended for:
Network planning engineers
Commissioning engineers
Security Declaration
Encryption algorithm declaration
The encryption algorithms DES/3DES/SKIPJACK/RC2/RSA (RSA-1024 or
lower)/MD2/MD4/MD5 (in digital signature scenarios and password encryption)/SHA1
(in digital signature scenarios) have a low security, which may bring security risks. If
protocols allowed, using more secure encryption algorithms, such as AES/RSA
(RSA-2048 or higher)/SHA2/HMAC-SHA2 is recommended.
Password configuration declaration
− Do not set both the start and end characters of a password to "%^%#". This causes
the password to be displayed directly in the configuration file.
− To further improve device security, periodically change the password.
Personal data declaration
Your purchased products, services, or features may use users' some personal data during
service operation or fault locating. You must define user privacy policies in compliance
with local laws and take proper measures to fully protect personal data.
Feature declaration
− The NetStream feature may be used to analyze the communication information of
terminal customers for network traffic statistics and management purposes. Before
enabling the NetStream feature, ensure that it is performed within the boundaries
permitted by applicable laws and regulations. Effective measures must be taken to
ensure that information is securely protected.
− The mirroring feature may be used to analyze the communication information of
terminal customers for a maintenance purpose. Before enabling the mirroring
function, ensure that it is performed within the boundaries permitted by applicable
laws and regulations. Effective measures must be taken to ensure that information is
securely protected.
− The packet header obtaining feature may be used to collect or store some
communication information about specific customers for transmission fault and
error detection purposes. Huawei cannot offer services to collect or store this
information unilaterally. Before enabling the function, ensure that it is performed
within the boundaries permitted by applicable laws and regulations. Effective
measures must be taken to ensure that information is securely protected.
Reliability design declaration
Network planning and site design must comply with reliability design principles and
provide device- and solution-level protection. Device-level protection includes planning
principles of dual-network and inter-board dual-link to avoid single point or single link
of failure. Solution-level protection refers to a fast convergence mechanism, such as FRR
and VRRP.
Special Declaration
This document serves only as a guide. The content is written based on device
information gathered under lab conditions. The content provided by this document is
intended to be taken as general guidance, and does not cover all scenarios. The content
provided by this document may be different from the information on user device
interfaces due to factors such as version upgrades and differences in device models,
board restrictions, and configuration files. The actual user device information takes
precedence over the content provided by this document. The preceding differences are
beyond the scope of this document.
The maximum values provided in this document are obtained in specific lab
environments (for example, only a certain type of board or protocol is configured on a
tested device). The actually obtained maximum values may be different from the
maximum values provided in this document due to factors such as differences in
hardware configurations and carried services.
Interface numbers used in this document are examples. Use the existing interface
numbers on devices for configuration.
The pictures of hardware in this document are for reference only.
Symbol Conventions
The symbols that may be found in this document are defined as follows.
Symbol Description
Indicates an imminently hazardous situation which, if not
avoided, will result in death or serious injury.
Change History
Updates between document issues are cumulative. Therefore, the latest document issue
contains all updates made in previous issues.
Changes in Issue 03 (2017-09-20)
This issue is the third official release. The software version of this issue is
V800R009C10SPC200.
Changes in Issue 02 (2017-07-30)
This issue is the second official release. The software version of this issue is
V800R009C10SPC100.
1.9.2 ARP
1.9.2.1 Introduction
Definition
The Address Resolution Protocol (ARP) is an Internet protocol used to map IP addresses to
MAC addresses.
Purpose
If two hosts need to communicate, the sender must know the network-layer IP address of the
receiver. IP datagrams, however, must be encapsulated with MAC addresses before they can
be transmitted over the physical network. Therefore, ARP is needed to map IP addresses to
MAC addresses to ensure the transmission of datagrams.
Function Overview
Table 1-147 lists ARP features.
Benefits
ARP ensures communication by mapping IP addresses at the network layer to MAC addresses
at the link layer on Ethernet networks.
1.9.2.2 Principles
1.9.2.2.1 Basic Principles
Related Concepts
ARP involves the following concepts:
Address Resolution Protocol (ARP) messages
An ARP message can be an ARP request or reply message. Figure 1-595 shows the ARP
message format.
The Ethernet Address of destination field contains a total of 48 bits. Ethernet Address of destination
(0-31) indicates the first 32 bits of the Ethernet Address of destination field, and Ethernet Address of
destination (32-47) indicates the last 16 bits of the Ethernet Address of destination field.
An ARP message consists of 42 bytes. The first 14 bytes indicate the Ethernet frame
header, and the last 28 bytes are the ARP request or reply message content. Table 1-148
describes the fields in an ARP message.
Ethernet address 48 bits Source MAC address. The value of this field is the same as
of sender the Ethernet source MAC address in the Ethernet frame
header.
IP address of 32 bits Source IP address.
sender
Ethernet address 48 bits Destination MAC address. The value of this field in an
of destination ARP request message is 0x00-00-00-00-00-00.
IP address of 32 bits Destination IP address.
destination
ARP table
An ARP table contains the latest mapping between IP and MAC addresses. If a host
always broadcasts an ARP request message for a MAC address before it sends an IP
datagram, network communication traffic will greatly increase. Furthermore, all other
hosts on the network have to receive and process the ARP request messages, which
lowers network efficiency. To solve this problem, an ARP table is maintained on each
host to ensure efficient ARP operations. The mapping between an IP address and a MAC
address is called an ARP entry.
ARP entries can be classified as dynamic or static.
− Dynamic ARP entries are automatically generated and maintained by using ARP
messages. Dynamic ARP entries can be aged and overwritten by static ARP entries.
− Static ARP entries are manually configured and maintained by a network
administrator. Static ARP entries can neither be aged nor be overwritten by dynamic
ARP entries.
Before sending IP datagrams, a host searches the ARP table for the MAC address
corresponding to the destination IP address.
− If the ARP table contains the corresponding MAC address, the host directly sends
the IP datagrams to the MAC address instead of sending an ARP request message.
− If the ARP table does not contain the corresponding MAC address, the host
broadcasts an ARP request message to request the MAC address of the destination
host.
Reverse Address Resolution Protocol (RARP)
If only the MAC address of a host is available, the host can send and receive RARP
messages to obtain its IP address.
To do so, the network administrator must establish the mapping between MAC addresses
and IP addresses on a gateway. When a new host is configured, its RARP client requests
the host's IP address from the RARP server on the gateway.
Implementation
ARP implementation within a network segment
Figure 1-596 illustrates how ARP is implemented within a network segment, by using IP
datagram transmission from Host A to Host B as an example.
Figure 1-596 ARP implementation between Host A and Host B on the same network
segment
a. Host A searches its ARP table and does not find the mapping between the IP and
MAC addresses of Host B. Host A then sends an ARP request message for the
MAC address of Host B. In this ARP request message, the source IP and MAC
addresses are respectively the IP and MAC addresses of Host A, the destination IP
and MAC addresses are respectively the IP address of Host B and
00-00-00-00-00-00, and the Ethernet source MAC address and Ethernet destination
MAC address are respectively the MAC address of Host A and the broadcast MAC
address.
b. After CE1 receives the ARP request message, CE1 broadcasts it on the network
segment.
c. After Host B receives the ARP request message, Host B adds the MAC address of
Host A to its ARP table and sends an ARP reply message to Host A. In this ARP
reply message, the source IP and MAC addresses are respectively the IP and MAC
addresses of Host B, the destination IP and MAC addresses are respectively the IP
and MAC addresses of Host A, and the Ethernet source and destination MAC
addresses are respectively the MAC addresses of Host B and Host A.
The PE also receives the ARP request message but discards it because the destination IP address in the
ARP request message is not its own IP address.
d. CE1 receives the ARP reply message and forwards it to Host A.
e. After Host A receives the ARP reply message, Host A adds the MAC address of
Host B to its ARP table and sends the IP datagrams to Host B.
ARP implementation between different network segments
ARP messages are Layer 2 messages. Therefore, ARP is applicable only to devices on
the same network segment. If two hosts on different network segments need to
communicate, the source host sends IP datagrams to the default gateway, which in turns
forwards the IP datagrams to the destination host. ARP implementation between different
network segments involves separate ARP implementation within network segments. In
this manner, hosts on different network segments can communicate.
The following examples show how ARP is implemented between different network
segments, by using IP datagram transmission from Host A to Host C as an example.
Figure 1-597 illustrates how ARP is implemented between Host A and the PE on the
same network segment.
a. Host A searches its ARP table and does not find the mapping between the IP and
MAC addresses of Interface 1 on the default gateway PE that connects to Host C.
Host A then sends an ARP request message for the MAC address of the PE's
Interface 1. In this ARP request message, the source IP and MAC addresses are
respectively the IP and MAC addresses of Host A, the destination IP and MAC
addresses are respectively the IP address of the PE's Interface 1 and
00-00-00-00-00-00, and the Ethernet source and destination MAC addresses are
respectively the MAC address of Host A and the broadcast MAC address.
b. After CE1 receives the ARP request message, CE1 broadcasts it on the network
segment.
c. After the PE receives the ARP request message, the PE adds the MAC address of
Host A to its ARP table and sends an ARP reply message to Host A. In this ARP
reply message, the source IP and MAC addresses are respectively the IP and MAC
addresses of the PE's Interface 1, the destination IP and MAC addresses are
respectively the IP and MAC addresses of Host A, and the Ethernet source and
destination MAC addresses are respectively the MAC address of the PE's Interface
1 and the MAC address of Host A.
Host B also receives the ARP request message but discards it because the destination IP address in the
ARP request message is not its own IP address.
d. CE1 receives the ARP reply message and forwards it to Host A.
e. After Host A receives the ARP reply message, Host Aadds the MAC address of the
PE's Interface 1 to its ARP table and sends the IP datagrams to the PE.
Figure 1-598 illustrates ARP implementation between the PE and Host C on the same
network segment.
The PE searches its routing table and sends the IP datagrams from Interface 1 to
Interface 2.
a. The PE searches its ARP table and does not find the mapping between the IP
address and MAC address of Host C. Then, the PE sends an ARP request message
for the MAC address of Host C. In this ARP request message, the source IP and
MAC addresses are respectively the IP and MAC addresses of the PE's Interface 2,
the destination IP and MAC addresses are respectively the Host C's IP address and
00-00-00-00-00-00, and the Ethernet source and destination MAC address are
respectively the MAC address of Interface 2 on PE and the broadcast MAC address.
b. After CE2 receives the ARP request message, CE2 broadcasts it on the network
segment.
c. After Host C receives the ARP request message, Host C adds the MAC address of
the PE's Interface 2 to its ARP table and sends an ARP reply message to the PE. In
this ARP reply message, the source IP and MAC addresses are respectively the IP
and MAC addresses of Host C, the destination IP and MAC addresses are
respectively the IP and MAC addresses of the PE's Interface 2, and the Ethernet
source and destination MAC addresses are respectively the MAC address of Host C
and the MAC address of Interface 2 on PE.
Host D also receives the ARP request message but discards it because the destination IP address in the
ARP request message is not its own IP address.
d. CE2 receives the ARP reply message and forwards it to the PE.
e. After the PE receives the ARP reply message, the PE adds the MAC address of
Host C to its ARP table and sends the IP datagrams to Host C.
So far, the IP datagram transmission from Host A to Host C is complete.
1. ARP request messages are broadcast, whereas ARP reply messages are unicast.
2. In ARP implementation, the switches CE1 and CE2 transparently forward IP datagrams and do not
modify them.
Definition
Dynamic ARP allows devices to dynamically learn and update the mapping between IP and
MAC addresses using ARP messages. You do not need to manually configure the mapping.
Related Concepts
Dynamic ARP uses the dynamic ARP aging mechanism.
The dynamic ARP aging mechanism enables an ARP entry that is not used over a specified
period to be automatically deleted. This mechanism helps reduce storage space of ARP tables
and speed up ARP table queries.
Table 1-149 describes concepts related to the dynamic ARP aging mechanism.
Implementation
Dynamic ARP entries can be created, updated, and aged.
Creating and updating dynamic ARP entries
If a device receives an ARP message that meets either of the following conditions, the
device automatically creates or updates an ARP entry:
− The source IP address of the ARP message is on the same network segment as the
IP address of the inbound interface. The destination IP address of the ARP message
is the IP address of the inbound interface.
− The source IP address of the ARP message is on the same network segment as the
IP address of the inbound interface. The destination IP address of the ARP message
is the virtual IP address of the VRRP backup group configured on the interface on
the device.
Aging dynamic ARP entries
After the aging timer of a dynamic ARP entry on a device expires, the device sends ARP
aging probe messages to the peer device. If the device does not receive an ARP reply
message after the number of aging probe attempts reaches a specified number, the
dynamic ARP entry is aged. The shutdown operation on the interface will trigger ARP
entry aging deletion on the interface. The shutdown operation on the VS will trigger ARP
entry aging deletion in the VS.
This feature limits the rate of sending ARP probe messages in order to prevent too many
system resources from being used during ARP probing. In high-specification scenarios, it
usually takes a long time from when ARP probing starts to when ARP entry aging is
complete.
Enhanced Functions
Dynamic ARP has an enhanced Layer 2 topology probe function. This function enables a
device to set the aging time to 0 for all ARP entries corresponding to a VLAN to which a
Layer 2 interface belongs when the Layer 2 interface becomes Up. The device then resends
ARP probe messages to update all ARP entries.
If a non-Huawei device that connects to a Huawei device receives an ARP aging probe
message with the destination MAC address as the broadcast address and the ARP table of the
non-Huawei device contains the mapping between the IP address and MAC address of the
Huawei device, the non-Huawei device does not respond to the broadcast ARP aging probe
message. Therefore, the Huawei device considers the link to the non-Huawei device Down
and deletes the mapping between the IP address and MAC address of the non-Huawei device.
To prevent this problem, configure Layer 2 topology change so that the Huawei device
unicasts ARP aging probe messages to the non-Huawei device.
Usage Scenario
Dynamic ARP applies to a network with a complex topology, insufficient bandwidth resources,
and a high real-time communication requirement.
Benefits
Dynamic ARP entries are dynamically created and updated using ARP messages. They do not
need to be manually maintained, greatly reducing maintenance workload.
Definition
Static ARP allows a network administrator to create the mapping between IP and MAC
addresses.
Principles
Static ARP implements the following functions:
Binds IP addresses to the MAC address of a specified gateway so that IP datagrams
destined for these IP addresses must be forwarded by this gateway.
Binds the destination IP addresses of IP datagrams sent by a specified host to a
nonexistent MAC address, helping filter out unwanted IP datagrams.
To ensure the stability and security of network communication, deploy static ARP based on
actual requirements and network resources.
Related Concepts
Static ARP entries are classified as short or long entries.
Short static ARP entries
Short static ARP entries contain only IP and MAC addresses. A device still has to send
ARP request messages. If the source IP and MAC addresses of the received reply
messages are the same as the configured IP and MAC addresses in a short static ARP
entry, the device adds the interface that receives the ARP reply messages to the short
static ARP entry. The device can use this interface to forward subsequent messages
directly. Short static ARP entries cannot be directly used to forward messages.
Configuring short static ARP entries enables a host and a device to communicate using
fixed IP and MAC addresses.
In Network Load Balancing (NLB) scenarios, you must configure both MAC entries with multiple
outbound interfaces and short static ARP entries for the gateway. These MAC entries and short static
ARP entries must have the same MAC address. In NLB scenarios, short static ARP entries are also
called ARP entries with multiple outbound interfaces and cannot be updated manually.
Long static ARP entries
Long static ARP entries contain IP and MAC addresses as well as the VLAN and
outbound interface through which devices send packets. Long static ARP entries are
directly used to forward messages.
Configuring long static ARP entries enables a host and a device to communicate through
a specified interface in a VLAN.
Usage Scenario
Static ARP applies to the following scenarios:
Networks with a simple topology and high stability
Networks on which information security is of high priority, such as a government or
military network
Short static ARP entries mainly apply to scenarios in which network administrators want
to bind hosts' IP and MAC addresses but hosts' access interfaces can change.
Benefits
Static ARP ensures communication security. If a static ARP entry is configured on a device,
the device can communicate with the peer device using only the specified MAC address.
Network attackers cannot modify the mapping between the IP and MAC addresses using ARP
messages, ensuring communication between the two devices.
Principles
Gratuitous ARP allows a device to broadcast gratuitous ARP messages that carry the local IP
address as both the source and destination IP addresses to notify the other devices on the same
network segment of its address information. Gratuitous ARP is used in the following scenarios
to ensure the stability and reliability of network communication:
You need to check whether the IP address of a device conflicts with the IP address of
another device on the same network segment. The IP address of each device must be
unique to ensure the stability of network communication.
After the MAC address of a host changes after its network adapter is replaced, the host
must quickly notify other devices on the same network segment of the MAC address
change before the ARP entry is aged. This ensures the reliability of network
communication.
When a master/backup switchover occurs in a VRRP backup group, the new master
device must notify other devices on the same network segment of its status change.
Related Concepts
Gratuitous ARP uses gratuitous ARP messages. A gratuitous ARP message is a special ARP
message that carries the sender's IP address as both the source and destination IP addresses.
Implementation
Gratuitous ARP is implemented as follows:
If a device finds that the source IP address in a received gratuitous ARP message is the
same as its own IP address, the device sends a gratuitous ARP message to notify the
sender of the address conflict.
If a device finds that the source IP address in a received gratuitous ARP message is
different from its own IP address, the device updates the corresponding ARP entry with
the sender's IP and MAC addresses carried in the gratuitous ARP message.
Figure 1-599 illustrates how gratuitous ARP is implemented.
As shown in Figure 1-599, the IP address of Interface 1 on PE1 is 10.1.1.1, and the IP address
of Interface 2 on PE2 is 10.1.1.1.
1. Interface 1 broadcasts an ARP request message. Interface 2 receives the ARP request
message and finds that the source IP address in the message conflicts with its own IP
address. Interface 2 then performs the following operations:
a. Sends a gratuitous ARP message to notify Interface 1 of its IP address.
b. Generates a conflict node on its conflict link and then sends gratuitous ARP
messages to Interface 1 at an interval of 5 seconds.
2. Interface 1 receives the gratuitous ARP messages from Interface 2 and finds that the
source IP address in the messages conflicts with its own IP address. Interface 1 then
performs the following operations:
a. Sends a gratuitous ARP message to notify Interface 2 of its IP address.
b. Generates a conflict node on its conflict link and then sends gratuitous ARP
messages to Interface 2 at an interval of 5 seconds.
Interface 1 and Interface 2 send gratuitous ARP messages to each other at an interval of 5
seconds until the address conflict is rectified.
If one interface does not receive a gratuitous ARP message from the other interface within 8
seconds, the interface considers the address conflict rectified. The interface deletes the
conflict node on its conflict link and stops sending gratuitous ARP messages to the other
interface.
Functions
Gratuitous ARP has the following functions:
Checks for IP address conflicts. If a device receives a gratuitous ARP message from
another device, the IP addresses of the two devices conflict.
Notifies MAC address changes. When the MAC address of a host changes after its
network adapter is replaced, the host sends a gratuitous ARP message to notify other
devices of the MAC address change before the ARP entry is aged. This ensures the
reliability of network communication. After receiving the gratuitous ARP message, other
devices maintain the corresponding ARP entry in their ARP tables based on the address
information carried in the message.
Notifies status changes. When a master/backup switchover occurs in a VRRP backup
group, the new master device sends a gratuitous ARP message to notify other devices on
the network of its status change.
Benefits
Gratuitous ARP reveals address conflict on a network so that ARP tables of devices can be
quickly updated. This feature ensures the stability and reliability of network communication.
Principles
ARP is applicable only to devices on the same physical network. When a device on a physical
network needs to send IP datagrams to another physical network, the gateway is used to query
the routing table to implement communication between the two networks. However, routing
table query consumes system resources and can affect other services. To resolve this problem,
deploy proxy ARP on an intermediate device. Proxy ARP enables devices that reside on
different physical network segments but on the same IP network to resolve IP addresses to
MAC addresses. This feature helps reduce system resource consumption caused by routing
table queries and improves the efficiency of system processing.
Implementation
Routed proxy ARP
A large company network is usually divided into multiple subnets to facilitate
management. The routing information of a host in a subnet can be modified so that IP
datagrams sent from this host to another subnet are first sent to the gateway and then to
another subnet. However, this solution makes it hard to manage and maintain devices.
Deploying proxy ARP on the gateway effectively resolves management and maintenance
problems caused by network division.
Figure 1-600 illustrates how routed proxy ARP is implemented between Host A and Host
B.
a. Host A sends an ARP request message for the MAC address of Host B.
b. After the PE receives the ARP request message, the PE checks the destination IP
address of the message and finds that it is not its own IP address and determines
that the requested MAC address is not its MAC address. The PE then checks
whether there are routes to Host B.
If a route to Host B is available, the PE checks whether routed proxy ARP is
enabled.
If routed proxy ARP is enabled on the PE, the PE sends the MAC address
of its Interface 1 to Host A.
If routed proxy ARP is not enabled on the PE, the PE discards the ARP
request message sent by Host A.
If no route to Host B is available, the PE discards the ARP request message
sent by Host A.
c. After Host A learns the MAC address of the PE's Interface 1, Host A sends IP
datagrams to the PE using this MAC address.
The PE receives the IP datagrams and forwards them to Host B.
Intra-VLAN proxy ARP
Figure 1-601 illustrates how intra-VLAN proxy ARP is implemented between Host A
and Host C.
Host A, Host B, and Host C belong to the same VLAN, but Host A and Host C cannot
communicate at Layer 2 because port isolation is enabled on the CE. To allow Host A
and Host C to communicate, configure a interface 1 on the CE and enable intra-VLAN
proxy ARP.
a. Host A sends an ARP request message for the MAC address of Host C.
b. After the CE receives the ARP request message, the CE checks the destination IP
address of the message and finds that it is not its own IP address and determines
that the requested MAC address is not the MAC address of its Interface 1. The CE
then searches its ARP table for the ARP entry indicating the mapping between the
IP and MAC addresses of Host C.
If the CE finds this ARP entry in its ARP table, the CE checks whether
intra-VLAN proxy ARP is enabled.
If intra-VLAN proxy ARP is enabled on the CE, the CE sends the MAC
address of its interface 1 to Host A.
If intra-VLAN proxy ARP is not enabled on the CE, the CE discards the
ARP request message sent by Host A.
If the CE does not find this ARP entry in its ARP table, the CE discards the
ARP request message sent by Host A and checks whether intra-VLAN proxy
ARP is enabled.
If intra-VLAN proxy ARP is enabled on the CE, the CE broadcasts the
ARP request message with the IP address of Host C as the destination IP
address within VLAN 4. After the CE receives an ARP reply message
from Host C, the CE generates an ARP entry indicating the mapping
between the IP and MAC addresses of Host C.
If intra-VLAN proxy ARP is not enabled on the CE, the CE does not
perform any operations.
c. After Host A learns the MAC address of interface 1, Host A sends IP datagrams to
the CE using this MAC address.
The CE receives the IP datagrams and forwards them to Host C.
Inter-VLAN proxy ARP
Figure 1-602 illustrates how inter-VLAN proxy ARP is implemented between Host A
and Host B.
then searches its ARP table for the ARP entry indicating the mapping between the
IP and MAC addresses of Host B.
If the PE finds this ARP entry in its ARP table, the PE checks whether
inter-VLAN proxy ARP is enabled.
If inter-VLAN proxy ARP is enabled on the PE, the PE sends the MAC
address of its interface 1 to Host A.
If inter-VLAN proxy ARP is not enabled on the PE, the PE discards the
ARP request message sent by Host A.
If the PE does not find this ARP entry in its ARP table, the PE discards the
ARP request message sent by Host A and checks whether inter-VLAN proxy
ARP is enabled.
If inter-VLAN proxy ARP is enabled on the PE, the PE broadcasts the
ARP request message with the IP address of Host B as the destination IP
address within VLAN 2. After the PE receives an ARP reply message
from Host B, the PE generates an ARP entry indicating the mapping
between the IP and MAC addresses of Host B.
If inter-VLAN proxy ARP is not enabled on the PE, the PE does not
perform any operations.
c. After Host A learns the MAC address of interface 1, Host A sends IP datagrams to
the PE using this MAC address.
The PE receives the IP datagrams and forwards them to Host B.
Local proxy ARP
Figure 1-603 illustrates how local proxy ARP is implemented between Host A and Host
B.
Host A and Host B belong to the same bride domain (BD) but cannot communicate at
Layer 2 because port isolation is enabled on the CE. To enable Host A and Host B to
communicate, a VBDIF interface (VBDIF2) is configured on the CE to implement local
proxy ARP.
a. Host A sends an ARP request message for the MAC address of Host B.
b. After the CE receives the ARP request message, the CE checks the destination IP
address of the message and finds that it is not its own IP address and determines
that the requested MAC address is not the MAC address of VBDIF2. The CE then
searches its ARP table for the ARP entry indicating the mapping between the IP and
MAC addresses of Host B.
If the CE finds this ARP entry in its ARP table, the CE checks whether local
proxy ARP is enabled.
If local proxy ARP is enabled on the CE, the CE sends the MAC address
of VBDIF2 to Host A.
If local proxy ARP is not enabled on the CE, the CE discards the ARP
request message.
If the CE does not find this ARP entry in its ARP table, the CE discards the
ARP request message and checks whether local proxy ARP is enabled.
If local proxy ARP is enabled on the CE, the CE broadcasts an ARP
request message to request Host B's MAC address. After receiving an
ARP reply message from Host B, the CE generates an ARP entry for Host
B.
If local proxy ARP is not enabled on the CE, the CE does not perform any
operations.
c. After Host A learns the MAC address of VBDIF2, Host A sends IP datagrams to
the CE using this MAC address.
The CE receives the IP datagrams and forwards them to Host B.
Usage Scenario
Table 1-150 describes the usage scenarios for proxy ARP.
Local proxy In an EVC model, two hosts that need to communicate belong to the same
ARP network segment and the same BD in which user isolation is configured.
Benefits
Proxy ARP offers the following benefits:
Proxy ARP enables a host on a network to consider that the destination host is on the
same network segment. Therefore, the hosts do not need to know the physical network
details but can be aware of the network subnets.
All processing related to proxy ARP is performed on a gateway, with no configuration
needed on the hosts connecting to it. In addition, proxy ARP affects only the ARP tables
on hosts and does not affect the ARP table and routing table on a gateway.
Proxy ARP can be used when no default gateway is configured for a host or a host
cannot route messages.
1.9.2.2.6 ARP-Ping
Principles
ARP-Ping is classified as ARP-Ping IP or ARP-Ping MAC and is used to maintain a network
on which Layer 2 features are deployed. ARP-Ping uses ARP messages to detect whether an
IP or MAC address to be configured for a device is in use.
ARP-Ping IP
Before configuring an IP address for a device, check whether the IP address is being used
by another device. Generally, the ping operation can be used to check whether an IP
address is being used. However, if a firewall is configured for the device using the IP
address and the firewall is configured not to respond to ping messages, the IP address
may be mistakenly considered available. To resolve this problem, use the ARP-Ping IP
feature. ARP messages are Layer 2 protocol messages and, in most cases, can pass
through a firewall configured not to respond to ping messages.
ARP-Ping MAC
The host's MAC address is the fixed address of the network adapter on the host. It does
not normally need to be configured manually; however, there are exceptions. For
example, if a device has multiple interfaces and the manufacturer does not specify MAC
addresses for these interfaces, the MAC addresses must be configured, or a virtual MAC
address must be configured for a VRRP backup group. Before configuring a MAC
address, use the ARP-Ping MAC feature to check whether the MAC address is being
used by another device.
Related Concepts
ARP-Ping IP
A device obtains the specified IP address and outbound interface number from the
configuration management plane, saves them to the buffer, constructs an ARP request
message, and broadcasts the message on the outbound interface. If the device does not
receive an ARP reply message within a specified period, the device displays a message
indicating that the IP address is not being used by another device. If the device receives
an ARP reply message, the device compares the source IP address in the ARP reply
message with the IP address stored in the buffer. If the two IP addresses are the same, the
device displays the source MAC address in the ARP reply message and displays a
message indicating that the IP address is being used by another device.
ARP-Ping MAC
The ARP-Ping MAC process is similar to the ping process but ARP-Ping MAC is
applicable only to directly connected Ethernet LANs or Layer 2 Ethernet virtual private
networks (VPNs). A device obtains the specified MAC address and outbound interface
number (optional) from the configuration management plane, constructs an Internet
Control Message Protocol (ICMP) Echo Request message, and broadcasts the message
on all outbound interfaces. If the device does not receive an ICMP Echo Reply message
within a specified period, the device displays a message indicating that the MAC address
is not being used by another device. If the device receives an ICMP Echo Reply message
within a specified period, the device compares the source MAC address in the message
with the MAC address stored on the device. If the two MAC addresses are the same, the
device displays the source IP address in the ICMP Echo Reply message and displays a
message indicating that the MAC address is being used by another device.
Implementation
ARP-Ping IP implementation
A device cannot allow the arp-ping ip command to ping its own IP address, whereas the ping command
allows this function.
ARP-Ping MAC implementation
As shown in Figure 1-605, Device A uses ARP-Ping MAC to check whether MAC
address 0013-46E7-2EF5 is being used by another host. After receiving ICMP Echo
Reply messages from all hosts on the network, Device A displays the IP address of the
host with the MAC address 0013-46E7-2EF5 and displays a message indicating that the
MAC address is being used by another host.
The ARP-Ping MAC implementation process is as follows:
a. After MAC address 0013-46E7-2EF5 is specified using a command line on Device
A, Device A broadcasts an ICMP Echo Request message and starts a timer for
ICMP Echo Reply messages.
b. After receiving the ICMP Echo Request message, all the other hosts on the same
LAN send ICMP Echo Reply messages to Device A.
c. After Device A receives an ICMP Echo Reply message from a host, Device A
compares the source MAC address in the message with the MAC address specified
in the command line.
If the two MAC addresses are the same, Device A displays the source IP
address in the ICMP Echo Reply message and displays a message indicating
that the MAC address is being used by another host. Meanwhile, Device A
stops the timer for ICMP Echo Reply messages.
If the two MAC addresses are different, Device A discards the ICMP Echo
Reply message and displays a message indicating that the MAC address is not
being used by another host.
If Device A does not receive any ICMP Echo Reply messages before the ICMP
Echo Reply message timer expires, it displays a message indicating that the MAC
address is not being used by another host.
Usage Scenario
ARP-Ping applies to directly connected Ethernet LANs or Layer 2 Ethernet VPNs.
Benefits
ARP-Ping checks whether an IP or MAC address to be configured is being used by another
device, preventing address conflict.
Background
Figure 1-606 shows a typical network topology with a VRRP backup group deployed. In the
topology, Device A is a master device, and Device B is a backup device. In normal
circumstances, Device A forwards uplink and downlink traffic. If Device A or the link
between Device A and the Switch becomes faulty, a master/backup VRRP switchover is
triggered to switch Device B to the Master state. Device B needs to advertise a network
segment route to a device on the network side to direct downlink traffic to Device B. If
Device B has not learned ARP entries from a device on the user side, the downlink traffic is
interrupted.
Dual-device ARP hot backup applies in both Virtual Router Redundancy Protocol (VRRP) and enhanced
trunk (E-Trunk) scenarios. This section describes the implementation of dual-device ARP hot backup in
VRRP scenarios.
Implementation
After you deploy dual-device ARP hot backup, the new master device forwards the downlink
traffic without learning ARP entries again. Dual-device ARP hot backup ensures downlink
traffic continuity.
As shown in Figure 1-607, a VRRP backup group is configured on Device A and Device B.
Device A is a master device, and Device B is a backup device. Device A forwards uplink and
downlink traffic.
If Device A or the link between Device A and the Switch becomes faulty, a master/backup
VRRP switchover is triggered to switch Device B to the Master state. Device B needs to
advertise a network segment route to a device on the network side to direct downlink traffic to
Device B.
Before you deploy dual-device ARP hot backup, Device B does not learn ARP entries
from a device on the user side and therefore a large number of ARP Miss messages are
transmitted. As a result, system resources are consumed and downlink traffic is
interrupted.
After you deploy dual-device ARP hot backup, Device B backs up ARP information on
Device A in real time. When Device B receives downlink traffic, it forwards the
downlink traffic based on the backup ARP information.
Related Concepts
VRRP is a fault-tolerant protocol. VRRP groups several routers into a virtual router. If the
next hop of a router is faulty, VRRP switches traffic to another router for transmission,
ensuring the continuity and reliability of communication.
For details about VRRP, see the chapter "VRRP" in the NE20E Feature Description - Network
Reliability.
E-Trunk is an inter-device link aggregation mechanism, which aggregates links between
devices to provide device-level reliability. For details about E-Trunk, see the chapter
"E-Trunk" in the NE20E Feature Description - LAN and MAN Access.
Usage Scenario
Dual-device ARP hot backup applies when VRRP or E-Trunk is deployed to implement a
master/backup device switchover.
To ensure that ARP entries are completely backed up, set the VRRP or E-Trunk switchback delay to a
value greater than the number of ARP entries that need to be backed up divided by the slowest backup
speed.
Benefits
Dual-device ARP hot backup prevents downlink traffic from being interrupted because the
backup device does not learn ARP entries of a device on the user side during a master/backup
device switchover, which improves network reliability.
1.9.2.3 Applications
1.9.2.3.1 Intra-VLAN Proxy ARP Application
Networking Description
As shown in Figure 1-608, to facilitate ease of management, communication isolation is
implemented for various departments on the intranet of a company. For example, although
Host A of the president's office, Host B of the R&D department, and Host C of the financial
department belong to the same VLAN, they cannot communicate at Layer 2. However, the
business requires that the president's office communicate with the financial department. To
permit this, enable intra-VLAN proxy ARP on the CE so that Host A can communicate with
Host C.
Before intra-VLAN proxy ARP is enabled, if Host A sends an ARP request message for
the MAC address of Host C, the message cannot be broadcast to hosts of the R&D
department and financial department because port isolation is configured on the CE.
Therefore, Host A can never learn the MAC address of Host C and cannot communicate
with Host C.
After intra-VLAN proxy ARP is enabled, the CE does not discard the ARP request
message sent from Host A even if the destination IP address in the message is not its own
IP address. Instead, the CE sends the MAC address of its VLANIF4 to Host A. Host A
then sends IP datagrams to this MAC address.
Feature Deployment
Configure interface 1, which is a Layer 3 interface, on the CE, and enable intra-VLAN proxy
ARP. After the deployment, the CE sends the MAC address of its interface 1 to Host A when
receiving a request for the MAC address of Host C from Host A. Host A then sends IP
datagrams to the CE, which forwards the IP datagrams to Host C. Consequently, the
communication between Host A and Host C is implemented.
Networking Description
As shown in Figure 1-609, the intranet of an organization communicates with the Internet
through the gateway PE. To prevent network attackers from obtaining private information by
modifying ARP entries on the PE, deploy static ARP.
Before static ARP is deployed, the PE dynamically learns and updates ARP entries using
ARP messages. However, dynamic ARP entries can be aged and overwritten by new
dynamic ARP entries. Therefore, network attackers can send fake ARP messages to
modify ARP entries on the PE to obtain the private information of the organization.
After static ARP is deployed, ARP entries on the PE are manually configured and
maintained by a network administrator. Static ARP entries are neither aged nor
overwritten by dynamic ARP entries. Therefore, deploying static ARP can prevent
network attackers from sending fake ARP messages to modify ARP entries on the PE,
and information security is ensured.
Feature Deployment
Deploy static ARP on the PE to set up fixed mapping between IP and MAC addresses of hosts
on the intranet. This can prevent network attackers from sending fake ARP messages to
modify ARP entries on the PE, ensuring the stability and security of network communication
and minimizing the risk of private information being stolen.
Terms
Term Definition
ARP Address Resolution Protocol. An Internet protocol used to map IP
addresses to MAC addresses.
1.9.3 ACL
1.9.3.1 Introduction
Definition
As the name indicates, an Access Control List (ACL) is a list. The list contains matching
clauses, which are actually matching rules and used to tell the device to perform action on the
packet or not.
Purpose
ACLs are used to ensure reliable data transmission between devices on a network by
performing the following:
Defend the network against various attacks, such as attacks by using IP, Transmission
Control Protocol (TCP), or Internet Control Message Protocol (ICMP) packets.
Control network access. For example, ACLs can be used to control enterprise network
user access to external networks, to specify the specific network resources accessible to
users, and to define the time ranges in which users can access networks.
Limit network traffic and improve network performance. For example, ACLs can be
used to limit the bandwidth for upstream and downstream traffic and to apply charging
rules to user requested bandwidth, therefore achieving efficient utilization of network
resources.
Benefits
ACL rules are used to classify packets. After ACL rules are applied to a router, the router
permits or denies packets based on them. The use of ACL rules therefore greatly improves
network security.
An ACL is a set of rules. It identifies a type of packet but does not filter packets. Other ACL-associated
functions are used to filter identified packets.
1.9.3.2 Principles
1.9.3.2.1 Basic ACL Concepts
ACL type
ACL can be classified as ACL4 or ACL6 based on the support for IPv4 or IPv6.
The following table outlines ACL4 classification based on functions.
For easy memorization, use names instead of numbers to define ACLs. Just like using domain
names to replace IP addresses. ACLs of this type are called named ACLs. The ACL stated
above called numbered ACLs.
The only difference between named and numbered ACLs is that the former ones are more
recognizable owing to descriptive names.
When naming an ACL, you can specify a number for it. If no number is specified, the system
will allocate one automatically.
One name is only for one ACL. Multiple ACLs cannot have the same name, even if they are of different
types.
ACL step
An ACL step is the difference between two adjacent ACL rule numbers that are automatically
allocated. For example, if the step is set to 5, the rule numbers are multiples of 5, such as 5, 10,
15, and 20.
If an ACL step is changed, rules in the ACL are automatically renumbered. For example,
if the ACL step is changed from 5 to 2, the original rule numbers 5, 10, 15, and 20 will
be renumbered as 2, 4, and 6.
If the default step 5 is restored for an ACL, the system immediately renumbers the rules
in the ACL based on the default step. For example, if the step of ACL 3001 is 2, rules in
ACL 3001 are numbered 0, 2, 4, and 6. If the default step 5 is restored, the rules will be
renumbered as 5, 10, 15, and 20.
An ACL step can be used to maintain ACL rules and makes it convenient to add new ACL
rules. If a user has created four rules numbered 0, 5, 10, and 15 in an ACL, the user can add a
rule (for example, rule number 1) between rules 0 and 5.
What is "Matched"
Matched: the ACL exists, and there is a rule to which the packet conforms, no matter the rule
is permit or deny.
Mismatched: the ACL does not exist, or there is no rule in the ACL, or the packet does not
conform to any rules of the ACL.
A rule is identified by a rule ID, which is configured by a user or generated by the system according to
the ACL step. All rules in an ACL are arranged in ascending order of rule IDs.
If the rule ID is automatically allocated, there is a certain space between two rule IDs. The size of the
space depends on the ACL step. For example, if the ACL step is set to 5, the difference between two rule
IDs are 5, such as 5, 10, 15, and the rest may be deduced by analogy. If the ACL step is 2, the rule IDs
generated automatically by the system start from 2. In this manner, the user can add a rule before the
first rule.
In configuration file, the rules are displayed in ascending order of rule IDs, not in the order of
configuration.
Rule can be arranged in two modes: Configuration mode and Auto mode. The default mode is
Configuration.
If the Configuration mode is used, users can set rule IDs or allow a device to
automatically allocate rule IDs based on the step.
If rule IDs are specified when rules are configured, the rules are inserted at places
specified by the rule IDs. For example, three rules with IDs 5, 10, and 15 exist on a
device. If a new rule with ID 3 is configured, the rules are displayed in ascending order,
3, 5, 10, and 15. This is the same as inserting a rule before ID 5. If users do not set rule
IDs, the device automatically allocates rule IDs based on the step. For example, if the
ACL step is set to 5, the difference or interval between two rule IDs is 5, such as 5, 10,
15, and the rest may be deduced by analogy.
If the ACL step is set to 2, the device allocates rule IDs starting from 2. The step allows
users to insert new rules, facilitating rule maintenance. For example, the ACL step is 5
by default. If a user does not configure a rule ID, the system automatically generates a
rule ID 5 as the first rule. If the user intends to add a new rule before rule 5, the user only
needs to input a rule ID smaller than 5. After the automatic realignment, the new rule
becomes the first rule.
In the Configuration mode, the system matches rules in ascending order of rule IDs. As a
result, a latter configured rule may be matched earlier.
If the auto mode is used, the system automatically allocates rule IDs, and places the most
precise rule in the front of the ACL based on the depth-first principle. This can be
implemented by comparing the address wildcard. The smaller the wildcard, the narrower
the specified range.
For example, 129.102.1.1 0.0.0.0 specifies a host with the IP address 129.102.1.1, and
129.102.1.1 0.0.0.255 specifies a network segment with the network segment address
ranging from 129.102.1.1 to 129.102.1.255. The former specifies a narrower host range
and is placed before the latter.
The detailed operations are as follows:
− For basic ACL rules, the source address wildcards are compared. If the source
address wildcards are the same, the system matches packets against the ACL rules
based on the configuration order.
− For advanced ACL rules, the protocol ranges and then the source address wildcards
are compared. If both the protocol ranges and the source wildcards are the same, the
destination address wildcards are then compared. If the destination address
wildcards are also the same, the ranges of source port numbers are compared with
the smaller range being allocated a higher precedence. If the ranges of source port
numbers are still the same, the ranges of destination port numbers are compared
with the smaller range being allocated a higher precedence. If the ranges of
destination port numbers are still the same, the system matches packets against ACL
rules based on the configuration order of rules.
For example, a wide range of packets are specified for packet filtering. Later, it is required that packets
matching a specific feature in the range be allowed to pass. If the auto mode is configured in this case,
the administrator only needs to define a specific rule and does not need to re-order the rules because a
narrower range is allocated a higher precedence in the auto mode.
Table 1-153 describes the depth-first principle for matching ACL rules.
Table 1-154 The default value of the application modules for mismatched case
Example
The following commands are configured one after another:
rule deny ip dscp 30 destination 1.1.0.0 0.0.255.255
rule permit ip dscp 30 destination 1.1.1.0 0.0.0.255
If the config mode is used, the rules in the ACL are displayed as follows:
acl 3000
rule 5 deny ip dscp 30 destination 1.1.0.0 0.0.255.255
rule 10 permit ip dscp 30 destination 1.1.1.0 0.0.0.255
If the auto mode is used, the rules in the ACL are displayed as follows:
acl 3000
rule 1 permit ip dscp 30 destination 1.1.1.0 0.0.0.255
rule 2 deny ip dscp 30 destination 1.1.0.0 0.0.255.255
If the device receives a packet with DSCP value 30 and destination IP address 1.1.1.1, the
packet is dropped when the config mode is used, but the packet is allowed to pass when the
auto mode is used.
1.9.3.3 Applications
1.9.3.3.1 ACL Applied to Telnet(VTY), SNMP, FTP & TFTP
Filtering Principle
When an ACL is applied to Telnet, SNMP, FTP or TFTP,
If the source IP address of the user matches the permit rule, the user is allowed to log on.
If the source IP address of the user matches the deny rule, the user is prohibited from
logging on.
If the source IP address of the user does not match any rule in the ACL, the user is
prohibit logging on.
If there is no rule in the ACL, or the ACL does not exist, all users are allowed to log on.
The default behavior is deny if the source IP address of the user does not match any rule in the ACL
applied to FTP.
When an ACL is applied to SNMP, if receiving a packet with the community name field being null, the
device directly discards the packet without filtering the packet based on the ACL rule. In addition, the
log about the community name error is generated. ACL filtering is triggered only when the community
name is not null.
If the NMS server belongs to a VPN, the VPN instance must be configured in the rule of ACL.
Figure 1-611 Relationships between an interface, traffic policy, traffic behavior, traffic classifier,
and ACL
By default, the order of classifier A, B, and C are 1, 2, and 3, which is the same as
configuration order. Now if you want to move the classifier A to be the last one, you can run
the following command:
classifier A behavior A precedence 4
The precedence 1 is not used, so you can add a classifier (named D) before classifier B by the
following command:
classifier D behavior D precedence 1
If you can add the classifier D by the following command not specifying precedence:
classifier D behavior D
As shown in the Figure 1-612, for each Classifier, if the logic between If-match clauses is OR,
a packet is matched against If-match clauses in the order of the If-match clauses configuration.
Once the packet is matched with an if-match clause,
If there is no ACL applied to the matched if-match clause, then the related behavior is
executed.
If an ACL is applied to the matched if-match clause, and the packet matched with the
permit rule, then the related behavior is executed.
If an ACL is applied to the matched if-match clause, and the packet matched with the
deny rule, then the packet is discarded directly and the related behavior is not executed.
If the packet is not matched any if-match clause, the related behavior is not executed, and the
next Classifier is processed for the packet.
If one of the if-match clauses is applied with ACL, each rule of the ACL is combinated with
all of the other if-match clauses.
Note: the rules of the ACL will not be combinated. Therefore, the order of the If-match
clauses in And logic does not impact on the final matching result, but the order of the rules in
the ACL still impacts on the final result.
For example, in the following configuration,
#
acl 3000
rule 5 permit ip source 1.1.1.1 0
rule 10 deny ip source 2.2.2.2 0
#
traffic classifier example operator and
if-match acl 3000
if-match dscp af11
#
The device will combinate all if-match clauses. The combination result is the same as the
following configurations.
#
acl 3000
rule 5 permit ip source 1.1.1.1 0 dscp af11
rule 10 deny ip source 2.2.2.2 0 dscp af11
#
traffic classifier example operator or
if-match acl 3000
#
traffic behavior example
remark dscp af22
#
traffic policy example
share-mode
classifier example behavior example
#
interface GigabitEthernet2/0/0
traffic-policy P inbound
#
Then, the device process the combinated If-match clause according to the procedure of OR
logic. The result is, the DSCP of the packets is remark as AF22 if the packet is received from
GE2/0/0 and the DSCP is 10 and the source IP address is 1.1.1.1/32; the DSCP of the packets
is discarded if the packet is received from GE2/0/0 and the DSCP is 10 and the source IP
address is 1.1.1.2/32; other packets are forwarded directly since they are not matched any
rule.
In the default License, AND logic permits only one if-match clause applied with ACL, and OR logic
permits multiple if-match clauses applied with ACL.
If the License is modified so that the multiple if-match clauses applied with ACL is permitted in AND
logic, then the combination principle is:
For traffic behavior mirroring or sampling, even if a packet matches a rule that defines a deny action, the
traffic behavior takes effect for the packet.
A permit or deny action can be specified in an ACL for a traffic classifier to work with
specific traffic behaviors as follows:
If the deny action is specified in an ACL, the packet that matches the ACL is denied,
regardless of what the traffic behavior defines.
If the permit action is specified in an ACL, the traffic behavior applies to the packet that
matches the ACL.
For example, the following configuration leads to such a result: the IP precedence of packets
with the source IP address 50.0.0.1/24 are re-marked as 7; the packets with the source IP
address 60.0.0.1/24 are dropped; the packets with the source IP address 70.0.0.1/24 are
forwarded with the IP precedence unchanged.
acl 3999
rule 5 permit ip source 50.0.0.0 0.255.255.255
rule 10 deny ip source 60.0.0.0 0.255.255.255
traffic classifier acl
if-match acl 3999
traffic behavior test
remark ip-pre 7
traffic policy test
classifier acl behavior test
interface GigabitEthernet1/0/1
traffic-policy test inbound
Permit Permit Yes The route is considered to match the if-match clause, and
the device continues to process the rest if-match clauses
in the same node.
If the route matches all if-match clause, then the
apply clause is executed and the device does not
continue to match against the rest nodes for this route.
If the route does not match all if-match clauses, the
apply clause is not executed. The device just
continues to process the rest nodes for the route. If
there is no rest node, the route is "deny".
No The route is considered to not match the if-match clause,
and the apply clause is not executed. The device just
continues to process the rest nodes for the route. If there
is no rest node, the mismatched route is "deny".
Permit Deny Yes The node does not take effect, and the device just
continues to process the rest nodes for the route. If there
No is no rest node, the route is "deny".
Deny Permit Yes The route is "deny", and the apply clause is not executed.
And the device does not continue to process the rest
nodes for the route.
No The route does not match the if-match clause, and the
apply clause is not executed. The device just continues to
process the rest nodes for the route. If there is no rest
node, the route is "deny".
Deny Deny Yes The node does not take effect, and the device just
continues to process the rest nodes for the route. If there
No is no rest node, the route is "deny".
The device continues to process the rest nodes if the route is deny by the ACL.
The device continues to process the rest nodes if the route does not match any rule in the ACL.
It is recommended to configure deny rules with smaller numbers to filter out the unwanted routes.
Then, configure permit rules with larger numbers in the same ACL to receive or advertise the other
routes.
It is recommended to configure permit rules with a smaller number to permit the routes to be
received or advertised by the device. Then, configure deny rules with larger numbers in the same
ACL to filter out unwanted routes.
Example2
In the following configurations, the result is, only the static route 20.1.0.0/24 can be imported
to BGP, and the local-preferences of all routes are modified. The "destination 10.1.0.0
0.0.0.255" does not take effect.
acl name example number 42768
rule 5 permit ip source 20.1.0.0 0.0.0.255 destination 10.1.0.0 0.0.0.255
#
route-policy policy1 permit node 10
if-match acl example
apply local-preference 1300
#
bgp 100
import-route static route-policy policy1
#
Example3
In the following configurations, the result is, all routes to 10.1.0.0/24 cannot be advertised to
BGP VPNv4 peer 1.1.1.1, no matter the L3VPNs the denied routes belong to. The
"vpn-instance vpnb" does not take effect.
acl example number 2000
rule 5 deny ip source 10.1.0.0 0.0.0.255 vpn-instance vpnb
rule 10 permit
#
route-policy policy1 permit node 10
if-match acl example
#
bgp 100
peer 1.1.1.1 as-number 100
peer 1.1.1.1 connect-interface LoopBack1
#
ipv4-family vpnv4
policy vpn-target
peer 1.1.1.1 enable
peer 1.1.1.1 route-policy policy1 export
#
ACL rule. The route 10.1.1.0/16 is considered to mismatch the ACL rule since it is outside of
the segment range of 10.1.1.0/24.
acl number 2000
rule 1 permit source 10.1.1.0 0.0.0.255
rule 99 deny any
10.1.1.0/24 matches the deny rule in node 10, so 10.1.1.0/24 is denied, the apply clause
in node 10 is not executed for 10.1.1.0/24, and the device continues to process node 20.
As a result, 10.1.1.0/24 is imported to BGP and its local-preference is not changed.
10.1.2.0/24 does not match any rule in node 10, so the apply clause in node 10 is not
executed, and the device continues to process node 20 for 10.1.2.0/24. As a result,
10.1.2.0/24 is imported to BGP.
The result is, both the static routes are imported to BGP, and the local-preferences of both
routes are not modified.
Node Is Permit, Rule Is Permit.
Configuration example:
acl number 2000
rule 1 permit source 10.1.1.0 0.0.0.255
#
route-policy policy1 deny node 10
if-match acl 2000
apply local-preference 1300
#
route-policy policy1 permit node 20
#
bgp 100
import-route static route-policy policy1
#
If you don't want to advertise the routes to 10.1.1.0/24 and 10.1.2.0/24 on RTB, you can
configure the following commands.
[RTB] acl 2000
[RTB-acl2000] rule 5 deny source 10.1.1.0 0.0.0.255
[RTB-acl2000] rule 10 deny source 10.1.2.0 0.0.0.255
[RTB-acl2000] rule 15 permit source any
[RTB] ospf 100
[RTB-ospf-100] filter-policy acl 2000 export
Filter-policy impacts only on the routes advertised to or received from neighbors, not on the routes
imported from a route protocol to another route protocol. To import routes learned by other routing
protocols, run the import-route command in the OSPF view.
If the unsupported matching option is configured for filter-policy, the matching result of the
option is "permit".
Example1
In the following configurations, the result is, all static routes are advertised to BGP peer.
acl name example number 42768
rule 5 deny ip destination 10.1.0.0 0.0.0.255
#
bgp 100
ipv4-family unicast
filter-policy acl-name example export
#
Example2
In the following configurations, the result is, only the static route 20.1.0.0/24 can be
advertised to BGP peer. The "destination 10.1.0.0 0.0.0.255" does not take effect.
acl name example number 42768
rule 5 permit ip source 20.1.0.0 0.0.0.255 destination 10.1.0.0 0.0.0.255
#
bgp 100
ipv4-family unicast
filter-policy acl-name example export
#
Example3
In the following configurations, the result is, all routes to 10.1.0.0/24 cannot be advertised to
all BGP VPNv4 peers, no matter the L3VPNs the denied routes belong to. The "vpn-instance
vpnb" does not take effect.
acl number 2000
rule 5 deny ip source 10.1.0.0 0.0.0.255 vpn-instance vpnb
rule 10 permit
#
route-policy policy1 permit node 10
if-match acl example
#
bgp 100
ipv4-family vpnv4
filter-policy 2000 export
#
Table 1-157 The default matching result of mismatched routes in multicast policy
Advanced ACL applied to multicast policy only supports two or three parameters, that is,
− Most multicast policies support only source, destination and time-range
− A few multicast policies supports only source and time-range
− Other multicast policies supports only destination and time-range
Named ACL applied to multicast policy only supports advanced ACL. If the named ACL number is out
of the range, the ACL does not take effect.
Module Function
TCP/IP attack defense Directly discards the TCP/IP attack packet.
TCP/IP attack defense function is
enabled by default.
Module Function
The whitelist, blacklist, and user-defined flow use ACL to define the characters of the flow.
Each CPU defend policy can be configured with one whitelist, one blacklist, and one or more
user-defined flows, as shown in the following figure.
cpu-defend policy 4
whitelist acl 2001
blacklist acl 2002
user-defined-flow 1 acl 2003
user-defined-flow 2 acl 2003
user-defined-flow 3 acl 2004
#
cpu-defend policy 5
whitelist acl 2005
By default, the packet to CPU is matched in the order of whitelist --> blacklist --> user-defined flow.
This order can be modified by commands.
1. Performs the URPF, TCP/IP attack defense, and GTSM check. Continues to do the next
step for the packets that pass the checks. The packets not pass the checks are discarded.
2. Matches against the whitelist. Performs CAR and go to step 5 for the packet those match
the permit rule. Discards the packets those match the deny rule. Continues to do the next
step for the mismatched packet.
3. Matches against the blacklist. Performs CAR and go to step 5 for the packet those match
the permit rule. Discards the packets those match the deny rule. Continues to do the next
step for the mismatched packet.
4. Matches against the user-defined flow. Performs CAR and go to step 5 for the packet
those match the permit rule. Discards the packets those match the deny rule. Continues to
do the next step for the mismatched packet.
5. Checks all packets based on application layer association. Sends only the packets belong
to enabled protocols. The packets belong to disabled protocols are discarded.
In the step 2, 3 and 4, the "mismatched" includes:
The packets mismatch all rules of the ACL.
The ACL does not exist.
The ACL exists but no rule in the ACL.
Directly drops the management packets received from the non-management interfaces.
The BFD echo packet is looped back through ICMP redirect at the remote end. In the IP
packet that encapsulates the BFD echo packet, the destination address and the source address
are the IP address of the outgoing interface of the local end. Therefore, in the ACL rule, both
the source addresses of the remote end and the local end must be permitted.
BFD passive echo supports only basic ACL, not support advanced ACL.
If the ACL applied to an established BFD session is modified, or a new ACL is applied to an established
BFD session, the ACL takes effect only when the session re-establishes or the parameters of the session
is modified.
Table 1-160 Matching Principle of the ACL Applied to BFD Passive Echo
Terms
Term Definition
Interface-based ACL A list of rules for packet filtering based on the inbound
interfaces of packets.
Basic ACL A list of rules for packet filtering based on the source IP
addresses of packets.
Advanced ACL A list of rules for packet filtering based on the source or
destination IP addresses of packets and protocol types. It filters
packets based on protocol information, such as TCP source and
destination port numbers and the ICMP type and code.
Layer 2 ACL A list of rules for packet filtering based on the Ethernet frame
header information, such as source or destination Media
Access Control (MAC) addresses, protocol types of Ethernet
frames, or 802.1p priorities.
Term Definition
User ACL A list of rules for packet filtering based on the
source/destination IP address, source/destination service group,
source/destination user group, source/destination port number,
and protocol type.
MPLS-based ACL A list of rules for packet filtering based on the EXP values,
Label values, or TTL values of MPLS packets.
1.9.4 DHCP
1.9.4.1 Introduction
Definition
The Dynamic Host Configuration Protocol (DHCP) dynamically assigns IP addresses to hosts
and centrally manages host configurations. DHCP uses the client/server model. A client
applies to the server for configuration parameters, such as an IP address, subnet mask, and
default gateway address; the server replies with the requested configuration parameters.
DHCPv4 and DHCPv6 are available for dynamic address allocation on IPv4 and IPv6
networks, respectively. Though DHCPv4 and DHCPv6 both use the client/server model, they
are built based on different principles and operate differently.
Purpose
A host can send packets to or receive packets from the Internet after it obtains an IP address,
as well as the router address, subnet mask, and DNS address.
The Bootstrap Protocol (BOOTP) was originally designed for diskless workstations to
discover their own IP addresses, the server address, the name of a file to be loaded into
memory, and the gateway IP address. BOOTP applies to a static scenario in which all hosts
are allocated permanent IP addresses.
However, as the increasing network scale and network complexity complicate network
configuration, the proliferation of portable computers and wireless networks brings about host
mobility, and the increasing number of hosts causes IP address exhaustion, BOOTP is no
longer applicable. To allow hosts to rapidly go online or offline, as well as to improve IP
address usage and support diskless workstations, an automatic address allocation mechanism
is needed based on the original BOOTP architecture.
DHCP was developed to implement automatic address allocation. DHCP extends BOOTP in
the following aspects:
Allows a host to exchange messages with a server to obtain all requested configuration
parameters.
Allows a host to rapidly and dynamically obtain an IP address.
Benefits
DHCP rapidly and dynamically allocates IP addresses, which improves IP address usage and
prevents the waste of IP addresses.
DHCPv4 Architecture
Figure 1-615 shows the DHCPv4 architecture.
DHCPv4 relay agents are not mandatory in the DHCPv4 architecture. A DHCPv4 relay agent is required
only when the server and client are located on different network segments.
DHCPv4 server
A DHCPv4 server processes address allocation, lease extension, and address release
requests originating from a DHCPv4 client or forwarded by a DHCPv4 relay agent and
assigns IP addresses and other configuration parameters to the client.
To protect a DHCP server against network attacks, such as man-in-the-middle attacks, starvation attacks,
and DoS attacks by changing the CHADDR value, configure DHCP snooping on the intermediate device
directly connecting to a DHCP client to provide DHCP security services.
yiaddr 4 bytes Client IP address assigned by the DHCP server. The DHCP server
fills this field into a DHCP Reply message.
siaddr 4 bytes Server IP address from which a DHCP client obtains the startup
configuration file.
giaddr 4 bytes Gateway IP address, which is the IP address of the first DHCP
relay agent. If the DHCP server and client are located on different
network segments, the first DHCP relay agent fills its own IP
address into this field of the DHCP Request message sent by the
client. The relay agent forwards the message to the DHCP server,
which uses this field to determine the network segment where the
client resides. The DHCP server then assigns an IP address on this
network segment from an address pool.
The DHCP server also returns a DHCP Reply message to the first
DHCP relay agent. The DHCP relay agent then forwards the
DHCP Reply message to the client.
NOTE
If the DHCP Request message passes through multiple DHCP Relay agents
before reaching the DHCP server, the value of this field remains as the IP
address of the first DHCP relay agent. However, the value of the Hops
chaddr 16 Client hardware address. This field must be consistent with the
bytes hardware type and hardware length fields. When sending a DHCP
Request message, the client fills its hardware address into this
field. For Ethernet, a 6-byte Ethernet MAC address must be filled
in this field when the hardware type and hardware length fields
are set to 1 and 6, respectively.
sname 64 Server host name. This field is optional and contains the name of
bytes the server from which a client obtains configuration parameters.
The field is filled in by the DHCP server and must contain a
character string that ends with 0.
file 128 Boot file name specified by the DHCP server for a DHCP client.
bytes This field is optional and is delivered to the client when the IP
address is assigned to the client. The field is filled in by the
DHCP server and must contain a character string that ends with 0.
options Variabl Optional parameters field. The length of this field must be at least
e 312 bytes. This field contains the DHCP message type and
configuration parameters assigned by a server to a client,
including the gateway IP address, DNS server IP address, and IP
address lease.
DHCPv4 Options
In the DHCPv4 options field, the first four bytes are decimal numbers 99, 130, 83 and 99,
respectively. This is the same as the magic cookie defined in standard protocols. The
remaining bytes identify several options as defined in standard protocols. One particular
option, the DHCP Message Type option (Option 53), must be included in every DHCP
message. Option 53 defines DHCP message types, including the DHCPDISCOVER,
DHCPOFFER, DHCPREQUEST, DHCPACK, DHCPNAK, DHCPDECLINE,
DHCPRELEASE, and DHCPINFORM messages.
DHCPv4 message types
Table 1-162 lists the DHCPv4 message types.
Type Description
DHCPDISCO A DHCP Discover message is broadcast by a DHCP client to locate a
VER DHCP server when the client attempts to access a network for the first
time.
DHCPOFFER A DHCP Offer message is sent by a DHCP server in response to a DHCP
Type Description
Discover message. A DHCP Offer message carries various configuration
parameters.
DHCPREQUE A DHCP Request message is sent in the following conditions:
ST After a DHCP client is initialized, it broadcasts a DHCP Request
message in response to the DHCP Offer message sent by a DHCP
server.
After a DHCP client restarts, it broadcasts a DHCP Request message
to confirm the configuration including the assigned IP address.
After a DHCP client obtains an IP address, it unicasts or broadcasts a
DHCP Request message to update the IP address lease.
DHCPACK A DHCP ACK message is sent by a DHCP server to acknowledge the
DHCP Request message from a DHCP client. After receiving a DHCP
ACK message, the DHCP client obtains the configuration parameters
including the IP address.
DHCPNAK A DHCP NAK message is sent by a DHCP server to reject the DHCP
Request message from a DHCP client. For example, if a DHCP server
cannot find matching lease records after receiving a DHCP Request
message, it sends a DHCP NAK message indicating that no IP address is
available for the DHCP client.
DHCPDECLI A DHCP Decline message is sent by a DHCP client to notify the DHCP
NE server that the assigned IP address conflicts with another IP address.
Then the DHCP client applies to the DHCP server for another IP address.
DHCPRELEA A DHCP Release message is sent by a DHCP client to release its IP
SE address. After receiving a DHCP Release message, the DHCP server can
assign this IP address to another DHCP client.
DHCPINFOR A DHCP Inform message is sent by a DHCP client to obtain other
M network configuration parameters such as the gateway address and DNS
server address after the DHCP client has obtained an IP address.
DHCPv4 options
The options field in a DHCP message carries control information and parameters that are
not defined in common protocols. When a DHCP client requests an IP address from a
DHCP server that has been configured to encapsulate the options field, the server returns
a DHCP Reply packet containing the options field. Figure 1-617 shows the options field
format.
The options field consists of the sub-fields Type, Length, and Value. Table 1-163 describes
these sub-fields.
The type value of the options field ranges from 1 to 255. Table 1-164 lists common
DHCPv4 options.
Options ID Description
1 Subnet mask
3 Gateway address
6 DNS address
15 Domain name
33 Group of classful static routes
After a DHCP client receives DHCP messages with this
option, it adds the classful static routes contained in the
option to its routing table. In classful routes, masks of
destination addresses are natural masks and cannot be used to
divide subnets. If Option 121 exists, Option 33 is ignored.
44 NetBIOS name
46 NetBIOS object type
50 Requested IP address
51 IP address lease
52 Additional option
53 DHCP message type
54 Server identifier
55 Parameter request list
The DHCP client uses this option to request specified
Options ID Description
configuration parameters
58 Lease renewal time (Time1), which is 50% of the lease time
59 Lease renewal time (Time2), which is 87.5% of the lease time
The Option 82 field is called the DHCP relay agent information field. It records the
location of a DHCP client. A DHCP relay agent or a DHCP snooping-enabled
device appends the Option 82 field to a DHCP Request message sent from a DHCP
client and forwards the message to a DHCP server.
Servers use the Option 82 field to learn the location of DHCP clients, implement
client security and accounting, and make parameter assignment policies, allowing
for more flexible address allocation.
The Option 82 field contains a maximum of 255 sub-options. If the Option 82 field
is defined, at least one sub-option must be defined. Currently, the device supports
only two sub-options: sub-option 1 (circuit ID) and sub-option 2 (remote ID).
The content of the Option 82 field is not uniformly defined, and vendors fill in the
Option 82 field as needed.
The device supports the following Option 82 field formats:
Type1: This is the Telecom format of Option 82.
Type2: This is the NMS format of Option 82.
Cn-telecom: This is the Option 82 format defined by China Telecom.
Self-define: This is the user-defined format of DHCP Option 82.
Three Modes for the Interaction Between the DHCP Client and Server
To obtain a valid dynamic IP address, a DHCP client exchanges different information with a
server at different stages. Generally, the DHCP client and server interact in the following
modes (defined in standard protocols):
A DHCP client accesses a network for the first time.
When a DHCP client accesses a network for the first time, the DHCP client undergoes
the following stages to set up a connection with a DHCP server:
− Discovering stage: At this stage, the DHCP client searches for a DHCP server. The
client broadcasts a DHCP Discover message and only DHCP servers respond to the
message.
− Offering stage: At this stage, each DHCP server offers an IP address to the DHCP
client. After receiving the DHCP Discover message from the client, each DHCP
server selects an unassigned IP address from the IP address pool and sends a DHCP
Offer message with the leased IP address and other settings to the client.
− Selecting stage: At this stage, the DHCP client selects an IP address. If multiple
DHCP servers send DHCP Offer messages to the client, the client accepts the first
received DHCP Offer message and broadcasts a DHCP Request message carrying
the selected IP address.
− Acknowledging stage: At this stage, the DHCP server confirms the IP address that
is offered. After receiving the DHCP Request message, the DHCP server sends a
DHCP ACK message to the client. The DHCP ACK message contains the offered IP
address and other settings. The DHCP client then binds its TCP/IP protocol suite to
the network interface card.
Except the IP address selected by the client, the IP addresses offered by other DHCP
servers are available to other clients.
A DHCP client accesses a network for the second time.
When a DHCP client accesses a network for the second time, the DHCP client undergoes
the following stages to set up a connection with the DHCP server:
− If the client has previously accessed the network correctly, it does not broadcast a
DHCP Discover message. Instead, it broadcasts a DHCP Request message that
carries the previously-assigned IP address.
− After receiving the DHCP Request message, the DHCP server responds with a
DHCP ACK message if the IP address is not assigned, notifying the client that it can
continue to use the original IP address.
− If the IP address cannot be assigned to the client (for example, it has been assigned
to another client), the DHCP server responds with a DHCP NAK message to the
client. After receiving the DHCP NAK message, the client sends a DHCP Discover
message to apply for an IP address.
A DHCP client extends the IP address lease.
The IP address dynamically assigned to a client has a validity period. The server
withdraws the IP address after the validity period expires. If the client intends to continue
to use this IP address, it must extend the IP address lease.
In real-world implementations, the DHCP client sends a DHCP Request message to the
server automatically to update the IP address lease when the DHCP client is started or
half of the lease has passed. If the IP address is valid, the server replies with a DHCP
ACK message to inform the client of the new IP address lease.
IP Address Reservation
DHCP supports IP address reservation for clients. The reserved IP addresses must belong to
the address pool. If an address in the address pool is reserved, it is no longer assignable.
Addresses are usually reserved for specific clients, such as DNS and WWW servers.
DHCP Client Requesting an IP Address Through a DHCP Relay Agent for the
First Time
Figure 1-619 shows the process of a DHCP client requesting an IP address through a DHCP
relay agent for the first time.
Figure 1-619 DHCP client requesting an IP address through a DHCP relay agent for the first time
1. When a DHCP client starts and initializes DHCP, it broadcasts the configuration request
packets (DHCP Discover messages) onto a local network.
After a DHCP relay agent connecting to the local network receives the broadcast packets,
it processes and forwards the packets to the specified DHCP server on another network.
2. After receiving the packets, the DHCP server sends the requested configuration
parameters in DHCP Offer messages to the DHCP client through the DHCP relay agent.
3. The DHCP client replies to the DHCP Offer message by broadcasting DHCP Request
messages.
Upon receipt, the DHCP relay agent sends the DHCP Request messages in unicast mode
to the DHCP server.
4. The DHCP server responds with a unicast DHCP ACK or DHCP NAK message through
the DHCP relay agent.
DHCP Client Extending the IP Address Lease Through the DHCP Relay Agent
An IP address dynamically assigned to a DHCP client usually has a validity period. The
DHCP server withdraws the IP address after the validity period expires. To continue using the
IP address, the DHCP client must renew the IP address lease.
The DHCP client enters the binding state after obtaining an IP address. The DHCP client has
three timers to control lease renewal, rebinding, and lease expiration. When assigning an IP
address to the DHCP client, the DHCP server can specify timer values. If the DHCP server
does not specify timer values, the default values are used. Table 1-165 describes the three
timers.
Figure 1-620 DHCP client extending the IP address lease by 50% through the DHCP relay agent
Figure 1-621 DHCP client extending the IP address lease by 87.5% through the DHCP relay agent
routes. If a DHCP server and a DHCP client reside on different VPNs, the DHCP replay agent
can transmit a DHCP Request message to the VPN where the DHCP server resides and
transmit a DHCP Reply message to the VPN where the DHCP client resides.
A DHCP relay agent can be deployed in CE1-PE1-PE2-CE2 networking, where the DHCP
server connects to one CE and the DHCP client connects to the other CE. Both CE1 and CE2
can belong to the same VPN or different VPNs.
DHCP Relay Agent Sending DHCP Release Messages to the DHCP Server
A DHCP relay agent can send a DHCP Release message, carrying an IP address to be released,
to the DHCP server.
When a DHCP client cannot send requests to the DHCP server to release its IP address, you
can configure the DHCP relay agent to release the IP address assigned by the DHCP server to
the client.
DHCP Relay Agent Setting the Priority of a DHCP Reply Message and TTL
Value of a DHCP Relay Message
A DHCP relay agent can set the priority of DHCP Reply messages. The priority of
low-priority DHCP Reply messages can be raised so that they will not be discarded on
access devices.
A DHCP relay agent can set the TTL value of DHCP Relay messages. The TTL value of
DHCP Relay messages can be increased to prevent the messages from being discarded
due to TTL becoming 0.
addresses, such as the DNS server address, NIS server address, and SNTP server
address.
− DHCPv6 Prefix Delegation (PD). IPv6 prefixes do not need to be manually
configured for the downstream routers. The DHCPv6 prefix delegation mechanism
allows a downstream router to send DHCPv6 messages carrying the IA_PD option
to an upstream router to apply for IPv6 prefixes. After the upstream router assigns a
prefix that has less than 64 bits to the downstream router, the downstream router
automatically subnets the delegated prefix into /64 prefixes and assigns the /64
prefixes to the links attached to IPv6 hosts through RA messages. This mechanism
implements automatic configuration of IPv6 addresses for IPv6 hosts and
hierarchical IPv6 prefix delegation.
DHCPv6 Architecture
Figure 1-622 shows the DHCPv6 architecture.
DHCPv6 server: processes address allocation, lease extension, and address release
requests originating from a DHCPv6 client or forwarded by a DHCPv6 relay agent and
assigns IPv6 addresses/prefixes and other configuration parameters to the client.
DHCPv6 messages share an identical fixed format header and a variable format area for
options.
Introduction
DHCPv6 Options
Introduction
DHCPv6 message types
Unlike DHCPv4 messages for which the message type is specified in the Message Type
option, DHCPv6 messages use the msg-type field in the header to identify the message
type. Table 1-166 lists the DHCPv6 message types.
msg-type 1 byte DHCP message type The value ranges from 1 to 11.
The available message types are
listed in Table 1-166.
transaction 3 bytes Transaction ID for this message -
-id exchange, indicating one
exchange of DHCPv6 messages
options Variab Options carried in this message -
le
Only Relay-forward and Relay-reply messages are exchanged between DHCPv6 relay
agents and servers. Figure 1-625 lists the fields of a DHCPv6 relay agent/server message.
DHCPv6 Options
DHCPv6 options format
Figure 1-625 shows the DHCPv6 options format.
implement client security and accounting, and make parameter assignment policies,
allowing for more flexible address allocation.
Table 1-170 lists the DHCPv6 relay options.
Overview
DHCPv6 relay agents relay DHCPv6 messages between DHCPv6 clients and servers that
reside on different network segments to facilitate dynamic address allocation. This function
enables a single DHCPv6 server to serve DHCPv6 clients on different network segments,
which reduces costs and facilitates centralized management.
A DHCPv6 relay agent relays both messages from clients and Relay-forward messages
from other relay agents. When a relay agent receives a valid message to be relayed, it
constructs a new Relay-forward message. The relay agent copies the received DHCP
message (excluding IP or UDP headers) into the Relay Message option in the new
message. If other options are configured on the relay agent, it also adds them to the
Relay-forward message. Table 1-171 lists the fields that a DHCPv6 relay agent can
encapsulate into a Relay-forward message.
Table 1-171 Fields that a DHCPv6 relay agent can encapsulate into a Relay-forward message
A DHCPv6 relay agent relays a Relay-reply message from a server. The relay agent
extracts the Relay Message option from a Relay-reply message and relays it to the
address contained in the peer-address field of the Relay-reply message. Table 1-172 lists
the fields that a DHCPv6 relay agent can encapsulate into a Relay-reply message.
Table 1-172 Fields that a DHCPv6 relay agent can encapsulate into a Relay-reply message
If a server does not have an address it can use to send a Reconfigure message directly to a client, the
server encapsulates the Reconfigure message into the Relay Message option of a Relay-Reply message
to be relayed by the relay agent to the client.
The Relay-Reply message must be relayed through the same relay agents as the original client message.
The server must be able to obtain the addresses of the client and all relay agents on the return path so it
can construct the appropriate Relay-reply message carrying the response.
DHCPv6 Client Applying for an IP Address Through a DHCPv6 Relay Agent for
the First Time
Figure 1-626 illustrates how a DHCPv6 client applies for an IP address to a DHCPv6 server
through a DHCPv6 relay agent for the first time.
Figure 1-626 DHCPv6 client applying for an IP address to a DHCPv6 server through a DHCPv6
relay agent for the first time
1. The DHCPv6 client sends a Solicit message to discover servers. The DHCPv6 relay
agent that receives the Solicit message constructs a Relay-forward message with the
Solicit message in the Relay Message option and sends the Relay-forward message to the
DHCPv6 server.
2. After the DHCPv6 server receives the Relay-forward message, it parses the Solicit
message and constructs a Relay-reply message with the Advertise message in the Relay
Message option. The DHCPv6 server then sends the Relay-reply message to the
DHCPv6 relay agent. The DHCPv6 relay agent parses the Relay Message option in the
Relay-reply message and sends the Advertise message to the DHCPv6 client.
3. The DHCPv6 client then sends a Request message to request IP addresses and other
configuration parameters. The DHCPv6 relay agent constructs a Relay-forward message
with the Request message in the Relay Message option and sends the Relay-forward
message to the DHCPv6 server.
4. After the DHCPv6 server receives the Relay-forward message, it parses the Request
message and constructs a Relay-reply message with the Reply message in the Relay
Message option. The Reply message contains the assigned IPv6 address and other
configuration parameters. The DHCPv6 server then sends the Relay-reply message to the
DHCPv6 relay agent. The DHCPv6 relay agent parses the Relay Message option in the
Relay-reply message and sends the Reply message to the DHCPv6 client.
On the network shown in Figure 1-627, IPv6 prefixes do not need to be manually configured
for the CPEs. The DHCPv6 prefix delegation mechanism allows a CPE to apply for IPv6
prefixes by sending DHCPv6 messages carrying the IA_PD option to the DHCPv6 server.
After the DHCPv6 server assigns a prefix that has less than 64 bits to the CPE, the CPE
automatically subnets the delegated prefix into /64 prefixes and assigns the /64 prefixes to the
user network through RA messages. This mechanism implements automatic configuration of
IPv6 addresses for IPv6 hosts and hierarchical IPv6 prefix delegation.
If a DHCPv6 relay agent is deployed to forward DHCPv6 messages between CPEs (DHCPv6
clients) and the DHCPv6 server, the DHCPv6 relay agent must set up routes to the network
segments on which the clients reside and advertises these network segments after the
DHCPv6 server assigns PD prefixes to the clients. Otherwise, core network devices cannot
learn the routes destined for the CPEs, and IPv6 hosts cannot access the network. If a client
sends a Release message to the server to return a delegated prefix, or the lease of a delegated
prefix is not extended after expiration, the DHCPv6 relay agent deletes the network segment
of the client.
1.9.4.4 Applications
1.9.4.4.1 DHCPv4 Server Application
Service Overview
A DHCP server is used to assign IP addresses in the following scenarios:
Manual configurations take a long time and bring difficulties to centralized management
on a large network.
Hosts on the network outnumber the available IP addresses. Therefore, not every host
can have a fixed IP address assigned. For example, if service providers (SPs) limit the
number of concurrent network access users, many hosts must dynamically obtain IP
addresses from the DHCP server.
Only a few hosts on the network require fixed IP addresses.
Networking Description
On a typical DHCP network, a DHCP server and multiple DHCP clients exist, such as PCs
and portable computers. DHCP uses the client/server model. A client applies to the server for
configuration parameters, such as an IP address, subnet mask, and default gateway address;
the server replies with the requested configuration parameters. Figure 1-628 shows typical
DHCP networking.
If a DHCP client and a DHCP server reside on different network segments, the client can obtain an IP
address and other configuration parameters from the server through a DHCP relay agent. For details
about DHCP relay, see 1.9.4.2.4 DHCPv4 Relay.
DHCPv4 and DHCPv6 relay applications are the same. The DHCP relay application described in this
section covers both DHCPv4 and DHCPv6 relay. However, DHCPv4 and DHCPv6 relay cannot be used
in the current version at the same time.
1.9.5 DNS
1.9.5.1 Introduction
Definition
Domain Name System (DNS) is a distributed database for TCP/IP applications that provides
conversion between domain names and IP addresses.
Purpose
DNS uses a hierarchical naming method to specify a meaningful name for each device on the
network and uses a resolver to establish mappings between IP addresses and domain names.
DNS allows users to use meaningful and easy-to-memorize domain names instead of IP
addresses to identify devices.
Benefits
When you check the continuity of a service, you can directly enter the domain name used to
access the service instead of the IP address. Even if the IP address used to access the service
has changed, you can still check continuity using the domain name, so long as the DNS server
has obtained the new IP address.
1.9.5.2 Principles
There are two complementary DNS methods: static and dynamic DNS. In domain name
resolution, static DNS is used first. If this method fails, dynamic DNS is used.
Related Concepts
Static DNS is implemented based on the static domain name resolution table. The mapping
between domain names and IP addresses recorded in the table is manually configured. You
can add some common domain names to the table to facilitate resolution efficiency.
Implementation
A DNS client establishes the static domain name resolution table based on configured static
DNS data. The DNS client can then automatically convert entered domain names to IP
addresses, if the entered domain names can be found in the static domain name resolution
table. Statically configured DNS data does not age.
Usage Scenario
If no DNS server exists on a network or the required DNS entries are not stored on the DNS
server, use static DNS to resolve domain names.
Benefits
If there are not many hosts accessed by Telnet applications and the hosts do not change
frequently, using static DNS improves resolution efficiency.
Related Concepts
Dynamic DNS allows client programs, such as ping and tracert, to use the resolver of a DNS
client to access a DNS server.
Resolver: a server that provides the mapping between domain names and IP addresses
and handles user request for domain name resolution.
Recursive resolution: If a DNS server cannot find the IP address corresponding to a
domain name, the DNS server turns to other DNS servers for help and sends the resolved
IP address to the DNS client.
Query type
− Class-A query: a query used to request the IPv4 address corresponding to a domain
name. This type of query is most commonly used in DNS resolution.
− Class-AAAA query: a query used to request the IPv6 address corresponding to a
domain name.
− PTR query: a query used to request the domain name corresponding to a IPv4
address.
Implementation
Dynamic DNS is implemented using the DNS server.
Figure 1-630 shows the relationship between the client program, resolver, DNS server, and
cache.
The DNS client is composed of the resolver and cache and is responsible for accepting and
responding to DNS queries from client programs. Generally, the client program, cache, and
resolver are on the same device, whereas the DNS server is on another device.
The implementation process is as follows:
1. A client program sends a request to the DNS client.
2. After receiving the request, the DNS client searches the local database or the cache. If
the required DNS entry is not found, the DNS client sends a query packet to the DNS
server. Currently, devices support the Class-A query, Class-AAAA query and PTR
query.
3. The DNS server searches its local database for the IP address corresponding to the
domain name carried in the query packet. If the corresponding IP address cannot be
found, the DNS server forwards the query packet to the upper-level DNS server for help.
The upper-level DNS server resolves the domain name in recursive resolution mode, as
specified in the query packet, and returns the resolution result to the DNS server. The
DNS server then sends the result to the DNS client.
4. After receiving the response packet from the DNS server, the DNS client sends the
resolution result to the client program.
Dynamic DNS allows you to define a domain name suffix list by pre-configuring some domain name
suffixes. After you enter a partial domain name, the DNS server automatically displays the complete
domain name with different suffixes for resolution.
Usage Scenario
Dynamic DNS is used in scenarios in which a large number of mappings between domain
names and IP addresses exist and these mappings change frequently.
Benefits
If a large number of mappings between domain names and IP addresses exist, manually
configuring DNS entries on each DNS server is laborious. To solve this problem, use dynamic
DNS instead. Dynamic DNS effectively improves configuration efficiency and facilitates
DNS management.
1.9.5.3 Applications
If you want to use domain names to visit other devices, configure DNS. DNS entries record
the mappings between domain names and IP addresses. In Figure 1-631, client programs and
the DNS client are on the same device.
If you seldom use domain names to visit other devices or no DNS server is available,
configure static DNS on the DNS client. To configure static DNS, you must know the
mapping between domain names and IP addresses. If a mapping changes, manually
modify the DNS entry on the DNS client.
If you want to use domain names to visit many devices and DNS servers are available,
configure dynamic DNS. Dynamic DNS requires DNS servers.
1.9.6 MTU
1.9.6.1 What is MTU
Maximum transmission unit (MTU) defines the largest size of packets that an interface can
sent without the need to fragment. IP packets larger than the MTU are fragmented before they
are sent out of an interface.
MTU is used to limit frame lengths on the link layer. In fact, devices of different vendors and
even different product models of the same vendor have different MTU definitions.
Take the Ethernet as an example.
In some devices, the MTU configured on the Ethernet interface indicates the largest size
of the IP datagram of the Ethernet frame, that is, the MTU is layer3 definition, named as
IP MTU.
In some devices, the MTU = Data payload + Destination MAC + Source MAC + Length,
that is, the MTU = IP MTU + 14Bytes.
In other devices, the MTU = Data payload + Destination MAC + Source MAC + Length
+ CRC, that is, MTU = IP MTU + 18Bytes.
In NE20E, the MTU is a layer 3 definition. As shown in the Figure 1, the MTU indicates the
largest size of the IP header + IP payload. If the MTU configured on an interface of Ethernet
type is 1500 bytes, the packet will not be fragmented if the total length of the IP header and IP
payload is not larger than 1500 bytes.
The interfaces of NE20E support an MTU between 46 and 9600 bytes. Each interface
supports a default MTU.
NOTE
Generally, only the source and destination nodes need to analyze the IPv6 extension headers. So fragmentation only occurs on
the source node, which is different from IPv4.
Force-fragment
By default, when the IPv4 packet's length is larger than the interface MTU,
If DF=0, the packet is fragmented.
If DF=1, the packet is not permit to fragmented, and the device drops the packet and
return a Packet-too-big message.
NE20E supports force-fragment function. If force-fragment is enabled, the board ignoring
DF-bit all IPv4 large packets (size> MTU) will be cut into packets and be forwarded with
DF=0.
To enable force-fragment function, run the ipv4 force-fragment enable command.
Force-fragment function is enabled only for IPv4 packets, not for other type packets.
By default, the force-fragment function is not enabled.
If the size (including the IP header and payload) of non-MPLS packets sent from control
plane, is greater than the MTU value configured on an outbound interface:
If the DF field is set to 0 in a packet, the packet is fragmented. The size of each fragment
is less than or equal to the interface MTU.
If the DF field is set to 1 in a packet, the packet is discarded.
If the DF field is set to 1 in a packet and the out interface is enabled with
forcible-fragmentation, the packet is fragmented. Each fragment is forwarded with DF=0.
(By default, forcible-fragmentation is not enabled for control plane. To enable
forcible-fragmentation for control plane, run the clear ip df command in the out
interface.
For the information about the fragmentation for MPLS packet, see the chapter 1.9.6.3 MPLS
MTU Fragmentation.
Protocol packets are usually allowed to be fragmented (DF=1), that is, the protocol packets
are usually not be discarded in the original device even when they exceed the MTU. the
protocol packets are not allowed to be fragment (DF=1) only when:
the device is implementing PMTU discover, such as IPv6 PMTU discover, or
LDP/RSVP-TE PMTU negotiation.
the ping -f command is running on the local device.
Scenarios Parameters which may affect MPLS MTU value selection ("Y" indicates affect,
"N" indicates no affect, the smallest value among the affecting parameters is
selected as the MPLS MTU)
LDP LSP Y Y Y N N
MPLS-TE Y Y N Y N
LDP over TE Y Y Y N Y
NOTE
In LDP over TE scenario, interface MTU on the tunnel interface affects MPLS MTU value selection,
because the LDP LSP is over TE tunnel and the TE tunnel interface is an out interface of the LDP LSP.
Scenarios Parameters which may affect MPLS MTU value selection ("Y" indicates affect,
"N" indicates no affect, the smallest value among the affecting parameters is
selected as the MPLS MTU)
According to the above rules, the selected MPLS MTU on NE20E is impossible larger than
the physical interface MTU. Therefore, the size of the MPLS-labeled packets are less than or
equal to the physical interface MTU and will not be discarded by the local device if DF=0.
values advertised by downstream LSRs as well as the MTU of the outbound interface mapped
to the local forwarding equivalence class (FEC) before advertising the selected MTU value to
the upstream LSR.
The default LDP MTU values vary according to types of LSRs along an LSP as follows:
The egress LSR uses the default MTU value of 65535.
The penultimate LSR assigned an implicit-null label uses the default LDP MTU equal to
the MTU of the local outbound interface mapped to the FEC.
Except the preceding LSRs, each LSR selects a smaller value as the local LDP MTU.
This value ranges between the MTU of the local outbound interface mapped to the FEC
and the MTU advertised by a downstream LSR. If an LSR receives no MTU from any
downstream LSR, the LSR uses the default LDP MTU value of 65535.
A downstream LSR adds the calculated LDP MTU value to the MTU type-length-value (TLV)
in a Label Mapping message and sends the Label Mapping message upstream.
If an MTU value changes (such as when the local outbound interface or its configuration is
changed), an LSR recalculates an MTU value and sends a Label Mapping message carrying
the new MTU value upstream. The comparison process repeats to update MTUs along the
LSP.
If an LSR receives a Label Mapping message that carries an unknown MTU TLV, the LSR
forwards this message to upstream LDP peers.
NE20E devices exchange Label Mapping messages to negotiate MPLS MTU values before
they establish LDP LSPs. Each message carries either of the following two MTU TLVs:
Huawei proprietary MTU TLV: sent by Huawei routers by default. If an LDP peer cannot
recognize this Huawei proprietary MTU TLV, the LDP peer forwards the message with
this TLV so that an LDP peer relationship can still be established between the Huawei
router and its peer.
Relevant standards-compliant MTU TLV: specified by commands on NE20E. NE20E
uses this MTU TLV to negotiate with non-Huawei devices.
1. The ingress sends a Path message with the ADSPEC object that carries an MTU value.
The smaller MTU value between the MTU configured on the physical outbound
interface and the configured MPLS MTU is selected.
2. Upon receipt of the Path message, a transit LSR selects the smallest MTU among the
received MTU value, the MTU configured on the physical outbound interface, and the
configured MPLS MTU. The transit LSR then sends a Path message with the ADSPEC
object that carries the smallest MTU value to the downstream LSR. This process repeats
until a Path message reaches the egress.
3. The egress uses the MTU value carried in the received Path message as the PMTU. The
egress then sends a Resv message that carries the PMTU value upstream to the ingress.
By default, Huawei routers implement MTU negotiation for VCs or PWs. Two nodes must
use the same MTU to ensure that a VC or PW is established successfully. L2VPN MTUs are
only used to establish VCs and PWs and do not affect packet forwarding.
To communicate with non-Huawei devices that do not verify L2VPN MTU consistency,
L2VPN MTU consistency verification can be disabled on NE20E. This allows NE20E to
establish VCs and PWs with the non-Huawei devices.
Definition
Load balancing distributes traffic among multiple links available to the same destination.
Purpose
After load balancing is deployed, traffic is split into different links. When one link used in
load balancing fails, traffic can still be forwarded through other links.
Benefits
Load balancing offers the following benefits to carriers:
Maximized network resource usage
Increased link reliability
If the Forwarding Information Base (FIB) of a device has multiple entries with the same
destination address and mask but different next hops, outbound interfaces, or tunnel IDs, route
load balancing can be implemented.
Solution 1: Configure multiple equal-cost routes with the same destination network segment but
different next hops and the maximum number of equal-cost routes for load balancing. This solution
is mostly used among links that directly connect two devices. However, this solution is being
replaced with the trunk technology as the trunk technology develops. Compared with this solution,
the trunk technology saves IP addresses and facilitates management by bundling links into a trunk.
Solution 2: Separate destination IP addresses into several groups and allocate one link for each group.
This solution improves the utilization of bandwidth resources. However, if you use this solution to
implement load balancing, you must observe and analyze traffic and know the distribution and
trends of traffic of various types.
By default, the load balancing modes of the traffic on control plane and forwarding plane are per-flow.
You can configure command to change the mode, see Configuring Load Balancing Mode.
ECMP
ECMP evenly load-balances traffic over multiple equal-cost paths to a destination,
irrespective of bandwidth. Equal-cost paths have the same cost to the destination.
When the bandwidth of these paths differs greatly, the bandwidth usage is low. On the
network shown in Figure 1-645, traffic is load-balanced over three paths, with the bandwidth
of 10 Mbit/s, 20 Mbit/s, and 30 Mbit/s, respectively. If ECMP is used, the total bandwidth can
reach 30 Mbit/s, but the bandwidth usage can only be 50%, the highest.
UCMP
UCMP load-balances traffic over multiple equal-cost paths to a destination based on
bandwidth ratios. All paths carry traffic based on their bandwidth ratios. As shown in Figure
1-646. This increases bandwidth usage.
Trunk load balancing does not have ECMP or UCMP, but has similar functions. For example, if
interfaces of different rates, for example, GE and FE interfaces, are bundled into a trunk interface, and
weights are assigned to the trunk member interfaces, traffic can be load-balanced over trunk member
links based on link weights. This is implemented in a similar way as UCMP. By default, the trunk
member interfaces have the same weight of 1. The default implementation is similar to ECMP, but all
member interfaces can only have the lowest forwarding capability among all.
On the network shown in Figure 1-647, OSPF is used as the routing protocol.
− OSPF is configured on Device A, Device B, Device C, Device D, and Device E.
OSPF learns three different routes.
− Packets entering Device A through Port 1 and heading for Device E are sent to the
destination according to specific load balancing modes by the three routes,
implementing load balancing.
Unequal-cost load balancing
When equal-cost load balancing is performed, traffic is load-balanced over paths,
irrespective of the difference between link bandwidths. In this situation, low-bandwidth
links may be congested, whereas high-bandwidth links may be idle. Unequal-cost load
balancing can solve this problem by balancing traffic based on the bandwidths of the
outbound interfaces.
Load balancing modes and algorithms of equal-cost and unequal-cost load balancing are
the same.
The working mechanisms of equal-cost load balancing and unequal-cost load balancing
are similar. The difference is that unequal-cost load balancing carries bandwidth
information to the FIB and generates an NHP table according to the bandwidth ratio so
that load balancing can be performed based on the bandwidth ratio.
In Figure 1-647, after unequal-cost load balancing is enabled on Device A, traffic is
load-balanced based on the bandwidth ratio of the three outbound interfaces on Device A.
For example, if the bandwidths of the three outbound interfaces are 0.5 Gbit/s, 1 Gbit/s,
and 2.5 Gbit/s, respectively, traffic is load-balanced by these interfaces at the ratio of
1:2:5.
For detailed about unequal-cost load balancing, see 1.9.8 UCMP.
MPLS load balancing
When MPLS load balancing is performed, the NP checks the load balancing table and
then hashes packets to different load balancing items.
In Figure 1-648, two equal-cost LSPs exist between Device B and Device C so that
MPLS load balancing can be performed.
Multicast load balancing
Multicast load balancing can be configured based on the multicast source, multicast
group, or multicast priority.
Two-Level Hash
When links connecting to next hops are trunk links, the traffic that is hashed based on
protocol-based load balancing is further hashed based on the trunk forwarding table. This is
the two-level hash.
In Figure 1-651, traffic is load balanced between Device A and Device B, and between Device
B and Device C. If the two load balancing processes use the same algorithm to calculate the
hash key, the same flow is always distributed to the same link. In this case, the forwarding of
the traffic is unbalanced.
Two-level load balancing works as follows:
A random number is introduced to the hash algorithm on each device. Random numbers vary
depending on devices, which ensures different hash results.
1.9.7.4.1.1 Overview
Huawei NE20E can implement load balancing using static routes and a variety of routing
protocols, including the Routing Information Protocol (RIP), RIP next generation (RIPng),
Open Shortest Path First (OSPF), OSPFv3, Intermediate System-to-Intermediate System
(IS-IS), and Border Gateway Protocol (BGP).
When multiple dynamic routes participate in load-balancing, these routes must have equal
metric. As metric can be compared only among routes of the same protocol, only routes of the
same protocol can load-balance traffic.
Conditions
When the maximum number of static routes that load-balance traffic and the maximum
number of routes of all types that load-balance traffic are both greater than 1, the following
rules apply:
If N active static routes with the same prefix are available and N is less than or equal to
the maximum number of static routes that can be used to load-balance traffic, traffic is
load-balanced among the N static routes, regardless of whether they have the same cost.
If a static route is active and has N iterative next hops, traffic is load-balanced among N
routes, which is called iterative load balancing.
In Figure 1-652, R1 learns two OSPF routes to 172.1.1.2/32, both with the cost 2. The
outbound interface and next hop of one route are GE 1/0/0 and 172.1.1.34, and the outbound
interface and next hop of the other route are GE 2/0/0 and 172.1.1.38.
Conditions
If the maximum number of OSPF routes that can be used to load-balance traffic and the
maximum number of routes of all types that can be used to load-balance traffic are both
greater than 1 and multiple OSPF routes with the same prefix exist, these routes participate in
load balancing only when the following conditions are met:
These routes are of the same type (intra-area, inter-area, Type-1 external, or Type-2
external).
These routes have different direct next hops.
These routes have the same cost.
If these routes are Type-2 external routes, the costs of the links to the ASBR or
forwarding address are the same.
If OSPF route selection specified in relevant standards is implemented, these routes have
the same area ID.
The OSPF route selection rules specified in relevant standards are different from those in relevant
standards. By default, Huawei NE20E perform OSPF route selection based on the rules specified in
relevant standards. To implement OSPF route selection based on the rules specified in relevant standards,
run the undo rfc1583 compatible command.
For the configuration guide of OSPF load-balance maximum number, see Configuring Unicast Route
Load Balancing.
Principles
If the number of OSPF routes available for load balancing is greater than the configured
maximum number of OSPF routes that can be used to load-balance traffic, OSPF selects
routes for load balancing in the following order:
1. Routes whose next hops have smaller weights
Weight indicates the route preference, and the weight of the next hop can be changed by the nexthop
command (in OSPF view). Routing protocols and their default preferences:
DIRECT: 0
STATIC: 60
IS-IS: 15
OSPF: 10
OSPF ASE: 150
OSPF NSSA: 150
RIP: 100
IBGP: 255
EBGP: 255
Each interface has an index, which can be seen in the display interface interface-name command in any
view.
3. Routes whose next hop IP addresses are larger.
Conditions
If the maximum number of IS-IS routes that can be used to load-balance traffic and the
maximum number of routes of all types that can be used to load-balance traffic are both
greater than 1 and multiple IS-IS routes with the same prefix exist, these routes can participate
in load balancing only when the following conditions are met:
These routes are of the same level (Level-1, Level-2, or Level-1-2).
These routes are of the same type (internal or external).
These routes have the same cost.
These routes have different direct next hops.
For the configuration guide of IS-IS load-balance maximum number, see Configuring Unicast Route
Load Balancing.
Principles
If the number of IS-IS routes available for load balancing is greater than the configured
maximum number of IS-IS routes that can be used to load-balance traffic, IS-IS selects routes
for load balancing in the following order:
1. Routes whose next hops have smaller weights
Weight indicates the route preference, and the weight of the next hop can be changed by the nexthop
command (in IS-IS view). Routing protocols and their default preferences:
DIRECT: 0
STATIC: 60
IS-IS: 15
OSPF: 10
OSPF ASE: 150
OSPF NSSA: 150
RIP: 100
IBGP: 255
EBGP: 255
Each interface has an index, which can be seen in the the display interface interface-name command in
any view.
6. Routes carrying IPv4, IPv6, and OSI next hop addresses, in descending order
7. Routes whose next hops have smaller IP addresses
8. If all the preceding items are the same, IS-IS selects the routes that are first calculated for
load balancing.
Conditions
Unlike an Interior Gateway Protocol (IGP), BGP imports routes from other routing protocols,
controls route advertisement, and selects optimal routes, rather than maintaining network
topologies or calculating routes by itself.
If the maximum number of BGP routes that can be used to load-balance traffic and the
maximum number of routes of all types that can be used to load-balance traffic are both
greater than 1, load balancing can be performed in either of the following modes:
Static routes or equal-cost IGP routes are used for BGP route iteration, and then traffic is
load-balanced among BGP routes.
BGP route attributes are modified to carry out load balancing.
In versions that support BGP independent route selection, BGP routes can be used to
load-balance traffic only when the following conditions are met:
− The PrefVal attributes of the BGP routes are the same.
For the configuration guide of BGP load-balance maximum number, see Configuring Unicast Route
Load Balancing.
Principles
If the number of BGP routes available for load balancing is greater than the configured
maximum number of BGP routes that can be used to load-balance traffic, BGP selects routes
for load balancing in the following order:
Routes with the shortest Cluster_List
Routes advertised by the routers with smaller router IDs. If the BGP routes carry
Originator_ID attributes, BGP selects the routes with smaller Originator_ID attributes
without comparing the router IDs.
Routes that are learned from BGP peers with smaller addresses
After a multicast load balancing policy is configured, a multicast router selects equal-cost
routes in each routing table on the device, such as, the unicast, MBGP, MIGP, and multicast
static routing tables. Based on the mask length and priority of each type of equal-cost routes,
the router selects a routing table on which multicast routing depends. Then, the router
implements load balancing among equal-cost routes in the selected routing table.
Load balancing can be implemented only between or among the same type of equal-cost routes. For
example, load balancing can be implemented between two MBGP routes but cannot be implemented
between an MBGP route and an MIGP route.
For the configuration guide of Multicast load-balance maximum number, see Configuring Multicast
Route Load Balancing.
After the tunnel policy is applied to a VPN, the VPN selects tunnels based on the following
rules:
If two or more CR-LSPs are available, the VPN selects any two of them at random.
If less than two CR-LSPs are available, the VPN selects all CR-LSPs and also selects
LSPs as substitutes to ensure that two tunnels are available for load balancing.
If two tunnels have been selected, one CR-LSP and the other LSP, and a CR-LSP is
added or a CR-LSP goes Up from the Down state, the VPN selects the CR-LSP to
replace the LSP.
If the number of existing tunnels for load balancing is smaller than the configured
number and a CR-LSP or LSP in the Up state is added, the newly added tunnel is also
used for load balancing.
If one or more tunnels used for load balancing go Down, the tunnel policy is triggered to
re-select tunnels. The VPN selects LSPs as substitutes to ensure that the configured
number of tunnels are used for load balancing.
The number of tunnels used for load balancing depends on the number of eligible tunnels.
For example, if there are only one CR-LSP and one LSP in the Up state, load balancing
is performed between the two tunnels. The tunnels of other types are not selected even if
they are Up.
Figure 1-656 Tunnels used for load balancing do not necessarily have the same cost
Routes used for load balancing must go over different paths, whereas tunnels used for
load balancing can go over the same path.
On the network shown in Figure 1-657, if two routes are available from PE1 to PE2 for
load balancing, these two routes must go over different paths. If two tunnels are available
from PE1 to PE2 for load balancing, these tunnels can go over the same path.
Figure 1-657 Tunnels used for load balancing are allowed to go over the same path
Routes used for load balancing must have the same type, whereas tunnels used for load
balancing can have different types.
For example, between the two routes used for load balancing, if one is a static route, the
other cannot be an OSPF route. However, between the two tunnels used for load
balancing, if one is a CR-LSP, the other can be an LSP.
For the configuration guide of Tunnel load-balance maximum number, see Configuring Tunnel Load
Balancing.
If a link in M links fails, LACP selects one from the N backup links to replace the faulty
one to retain M:N backup. The actual link bandwidth is still the sum of the bandwidths
of the M primary links.
If a link cannot be found in the backup links to replace the faulty link and the number of
member links in the Up state falls below the configured lower threshold of active links,
the Eth-Trunk interface goes Down. Then all member interfaces in the Eth-Trunk
interface no longer forward data.
An Eth-Trunk interface working in static LACP mode can contain member interfaces at
different rates, in different duplex modes, and on different boards. Eth-Trunk member
interfaces at different rates cannot forward data at the same time. Member interfaces in
half-duplex mode cannot forward data.
Manual load balancing mode: In this mode, you must manually create an Eth-Trunk
interface, add interfaces to the Eth-Trunk interface, and specify active member interfaces.
LACP is not involved. All active member interfaces forward data and perform load
balancing.
Traffic can be evenly load-balanced among all member interfaces. Alternatively, you can
set the weight for each member interface to implement uneven load balancing; in this
manner, the interface that has a larger weight transmits a larger volume of traffic. If an
active link in a link aggregation group fails, traffic is balanced among the remaining
active links evenly or based on weights, as shown in Figure 1-659.
An Eth-Trunk interface working in manual load balancing mode can contain member
interfaces at different rates, in different duplex modes, and on different boards.
For the configuration guide of Eth-Trunk load-balance maximum number, see Configuring Eth-Trunk
Load Balancing.
Hash Algorithm
The hash algorithm uses a hash function to map a binary value of any length to a smaller
binary value of a fixed length. The smaller binary value is the hash value. The device then
uses an algorithm to map the hash value to an outbound interface and sends packets out from
this outbound interface.
Hash Factor
Traffic is hashed based on traffic characteristics, which are called hash factors.
Traffic characteristics that can be used as hash factors include but are not limited to the
following:
Ethernet frame header: source and destination MAC addresses
IP header: source IP address, destination IP address, and protocol number
TCP/UDP header: source and destination port numbers
MPLS header: MPLS label and some bits in the MPLS payload
L2TP packets: tunnel ID and session ID
For the configuration guide of load-balance hash factor, see Adjusting the Algorithm of Load Balancing.
For the default hash factors of hash algorithm in typical load balance scenarios, see the chapter 1.9.7.6
Appendix: Default Hash Factors.
For the relative configuration guide, see Adjusting the Algorithm of Load Balancing.
For the default hash factors of hash algorithm in typical load balance scenarios, see the chapter 1.9.7.6
Appendix: Default Hash Factors.
PE (Provider Edge): an edge device on the provider network, which is directly connected
to the CE. The PE receives traffic from the CE and then encapsulates the traffic with
MPLS header, and then sends the traffic to P. The PE also receives traffic from the P and
then remove the MPLS header from the traffic, and then sends the traffic to CE.
P (Provider): a backbone device on the provider network, which is not directly connected
to the CE. Ps perform basic MPLS forwarding.
CE (Customer Edge): an edge device on the private network.
The hash algorithm is performed based on the packet format of the inbound traffic from AC
interface. The hash factors can be IP 5-tuple or IP 2-tuple. The result of the load balancing
depends on the discreteness of the private IP addresses or TCP/UDP ports of the packets.
The hash algorithm on P node is performed based on the packet format of the inbound MPLS
traffic.
If the number of MPLS labels in the packet is less than four, the hash factors can be IP
5-tuple or IP 2-tuple. The result of the load balancing depends on the discreteness of the
private IP addresses or TCP/UDP ports of the packets.
In the complex scenarios such as inter-AS VPN, FRR and LDP over TE, the number of
the labels in the packet may be four or more. In these scenarios, the hash factors are the
layer 4 or layer 5 label. The result of the load balancing depends on the discreteness of
the layer 4 or layer 5 labels of the packets.
The hash algorithm of the load balance on egress PE is the same as Scenario 2 if the
Penultimate Hop Popping (PHP) is disabled, the same as Scenario 1 if the Penultimate Hop
Popping (PHP) is enabled.
Figure 1-668 Load Balancing Among the L3 Outbound Interfaces in the Access of L2VPN to
L3VPN Scenarios
In access of L2VPN to L3VPN scenarios, the hash algorithm is the same as Scenario 1.
PE (Provider Edge): an edge device on the provider network, which is directly connected
to the CE. The PE receives traffic from the CE and then encapsulates the Ethernet traffic
with MPLS header, and then sends the traffic to P. The PE also receives traffic from the P
and then remove the MPLS header from the traffic, and then sends the traffic to CE.
P (Provider): a backbone device on the provider network, which is not directly connected
to the CE. Ps perform basic MPLS forwarding.
CE (Customer Edge): an edge device on the private network. CEs perform
Ethernet/VLAN layer2 forwarding.
The hash algorithm is performed based on the packet format of the inbound traffic from AC
interface.
IP traffic: the hash factors can be IP 5-tuple or IP 2-tuple. The result of the load
balancing depends on the discreteness of the private IP addresses or TCP/UDP ports of
the packets.
Ethernet carrying Non-IP traffic, the hash factors can be MAC 2-tuple. The result of the
load balancing depends on the discreteness of the MAC addresses of the packets. Some
boards support 3-tuple <source MAC, destination MAC, VC Label> if the inbound AC
traffic is MPLS traffic and the AC interface is not QinQ sub-interface.
The hash algorithm on P node is performed based on the packet format of the inbound MPLS
traffic.
If the number of labels in the packet is less than four, the hash factors can be IP 5-tuple
or IP 2-tuple. The result of the load balancing depends on the discreteness of the private
IP addresses or TCP/UDP ports of the packets.
In the complex scenarios such as inter-AS VPN, FRR and LDP over TE, the number of
the labels may be four or more. In these scenarios, the hash factors are the layer 4 or
layer 5 label. The result of the load balancing depends on the discreteness of the layer 4
or layer 5 labels of the packets.
If the traffic is from MPLS to AC, the hash factors can be IP 5-tuple, IP 2-tuple or MAC
2-tuple. The default hash factors may be different in different board-types. Some boards
only support MAC 2-tuple.
If the traffic is from AC to AC, the hash algorithm is the same as Scenario 1.
Figure 1-676 Load Balancing Among the L2 Outbound Interfaces in the Access of L2VPN to
L3VPN Scenarios
In access of L2VPN to L3VPN scenarios, the hash algorithm is the same as Scenario 1.
PE (Provider Edge): an edge device on the provider network, which is directly connected
to the CE. The PE receives traffic from the CE and then encapsulates the traffic with
MPLS header, and then sends the traffic to P. The PE also receives traffic from the P and
then remove the MPLS header from the traffic, and then sends the traffic to CE.
P (Provider): a backbone device on the provider network, which is not directly connected
to the CE. Ps perform basic MPLS forwarding.
CE (Customer Edge): an edge device on the private network.
The hash algorithm is performed based on the packet format of the inbound traffic from AC
interface.
IP traffic: the hash factors can be IP 5-tuple or IP 2-tuple. The result of the load
balancing depends on the discreteness of the private IP addresses or TCP/UDP ports of
the packets.
Ethernet carrying Non-IP traffic, the hash factors can be MAC 2-tuple. The result of the
load balancing depends on the discreteness of the MAC addresses of the packets.
Non-Ethernet traffic: the hash factor is VC label in most boards.
The hash algorithm on P node is performed based on the packet format of the inbound MPLS
traffic.
If the number of labels in the packet is less than four, the hash factors can be IP 5-tuple
or IP 2-tuple. The result of the load balancing depends on the discreteness of the private
IP addresses or TCP/UDP ports of the packets.
In the complex scenarios such as inter-AS VPN, FRR and LDP over TE, the number of
the labels may be four or more. In these scenarios, the hash factors are the fourth or fifth
label from the top. The result of the load balancing depends on the discreteness of the
fourth or fifth label from the top.
Egress PE of VLL/PWE3 only supports Trunk load balancing because the virtual circuit (VC)
of VLL/PWE3 is P2P.
If the traffic is from AC to AC, the hash algorithm is the same as Scenario 1.
If the traffic is from MPLS to AC, the hash factors can be IP 5-tuple, IP 2-tuple or VC
label. The hash factors may be different in different board-types.
Figure 1-684 Load Balancing Among the L2 Outbound Interfaces in the Access of L2VPN to
L3VPN Scenarios
In access of L2VPN to L3VPN scenarios, the hash algorithm is the same as Scenario 1.
If the transit nodes of L2TP tunnel use per-packet load balancing, the L2TP control
messages may arrive out of order, this may result in the failure of L2TP tunnel
establishment.
Data message: is used to transmit PPP frames over L2TP tunnel. The data message is not
retransmitted if lost. The format of L2TP data message is shown as Figure 1-687.
belongs to the same flow. The load balancing result depends on the number of the L2TP
tunnels carrying the traffic. The more L2TP tunnels, the better result of load balancing.
Figure 1-688 Transmitting data of multi-protocol local networks through the single-protocol
backbone network
In the scenarios stated above, the source IP addresses and the destination IP addresses of all
packets in the GRE tunnel are the source address and the destination address of the GRE
tunnel. Therefore, on any transit node or on egress node of the GRE tunnel, the TTLs in the
outer IP headers of the GRE packets are the same. If a flow is carried by only one GRE tunnel
and the load balancing mode is per-flow, the load balancing is not available. It is
recommended to create multiple GRE tunnels to carry the flow.
Default hash factors of IP unicast traffic depends on the type of the inbound board.
Balance-Preferred
Based on this policy, a multicast router evenly distributes (*, G) and (S, G) entries on their
corresponding equal-cost routes. This policy implements automatic load balancing adjustment
in the following conditions: Equal-cost routes are added, deleted, or modified; multicast
routing entries are added or deleted; the weights of equal-cost routes are changed.
This policy applies to a network on which multicast users frequently join or leave multicast
groups.
Stable-Preferred
Based on this policy, a multicast router distributes (*, G) entries and (S, G) entries on their
corresponding equal-cost routes. Therefore, stable-preferred is similar as the
balance-preferred policy. This policy implements automatic load balancing adjustment when
equal-cost routes are deleted. However, dynamic load balancing adjustment will not be
performed when multicast routing entries are deleted or when weights of load balancing
routes change.
This policy applies to a network that has stable multicast services.
Based on the balance-preferred policy, a multicast router takes load balancing as the
most important issue, so that the router rapidly responds to a change in unicast routes,
multicast routes, and weights of equal-cost routes.
Based on the stable-preferred policy, a multicast router prevents unnecessary link
switchovers to ensure stable services, so that the router rapidly responds to unicast delete
requests but does not adjust load. After the unicast route flapping problem is resolved,
the router selects optimal routes for subsequent services to resolve the imbalance
problem gradually. Therefore, stable-preferred provides both stable and load-balanced
services.
Terms
None
1.9.8 UCMP
1.9.8.1 Introduction
Definition
Unequal cost multipath (UCMP) allows traffic to be distributed according to the bandwidth
ratio of multiple unequal-cost paths that point to the same destination with the same
precedence. All paths carry proportional traffic according to bandwidth ratio to achieve
optimal load balancing.
Purpose
When equal-cost routes have multiple outbound interfaces that connect to both high-speed
links and low-speed links, equal cost multipath (ECMP) evenly distributes traffic among links
to a destination, regardless of the difference between link bandwidths. When the link
bandwidths differ greatly, low-bandwidth links may be congested, whereas high-bandwidth
links may be idle. To fully utilize bandwidths of different links, traffic must be balanced
according to the bandwidth ratio of these links.
1.9.8.2 Principles
1.9.8.2.1 Basic Principles
If multiple equal-cost routes reach the destination through multiple outbound interfaces,
bottom-layer hardware applies for resources according to the bandwidth ratio of these
interfaces so that the traffic ratio equals or approaches the bandwidth ratio on these interfaces.
When the bandwidth of an interface changes, traffic is automatically load-balanced according
to the new bandwidth radio.
Precautions
If interface-based UCMP is enabled, global UCMP cannot be enabled. Similarly, if
global UCMP is enabled, interface-based UCMP cannot be enabled.
The bandwidth accuracy for the interface board is Mbit/s, which supports high-speed
links.
You must run the shutdown and undo shutdown commands in sequence after enabling
UCMP on an interface. As a result, traffic is interrupted. Global UCMP avoids this
situation by providing more functions.
Precautions
If global UCMP is enabled, interface-based UCMP cannot be enabled. Similarly, if
interface-based UCMP is enabled, global UCMP cannot be enabled.
1.9.8.3 Applications
1.9.8.3.1 Interface-based UCMP Application
Device A has three physical outbound interfaces: Port 1, Port 2, and Port 3. The bandwidths of
the three interfaces are 10 Gbit/s, 1 Gbit/s, and 1 Gbit/s, respectively. Three IPv4 equal-cost
routes are available between Device A and Device B.
When UCMP is not enabled in the three interfaces, their traffic ratio is 1:1:1.
After UCMP is enabled on the three interfaces, the traffic ratio of the three interfaces
approaches the bandwidth ratio 10:1:1.
When UCMP is not enabled on the three interfaces, their traffic ratio is 1:1:1, irrespective of
the bandwidth ratio.
After global UCMP is enabled, traffic from Device A to Device B is load-balanced on the
three outbound interfaces, and the traffic ratio approaches the bandwidth ratio 3:1:1.
When a member interface of Eth-Trunk 1 is shut down, the bandwidth of Eth-Trunk 1 changes
to 2 Gbit/s and accordingly the bandwidth ratio of the three outbound interfaces is 2:1:1 for
load balancing.
When interfaces support UCMP, the bandwidths of equal-cost routes are displayed in the FIB
table. By calculating the bandwidth ratio of interfaces, you can see whether the bandwidth
ratio approaches the traffic ratio. In this way, you can learn whether UCMP functions
normally.
Terms
None
1.9.9 IPv4
1.9.9.1 Introduction
Definition
Internet Protocol version 4 (IPv4) is the core of the TCP/IP protocol suite and works at the
Internet layer in the TCP/IP model. This layer corresponds to the network layer in the OSI
model. At the IP layer, information is divided into data units, and address and control
information is added to allow datagrams to be routed.
IP provides unreliable and connectionless data transmission services. Unreliable transmission
means that IP does not ensure that IP datagrams successfully arrive at the destination. IP only
provides best effort delivery. Once an error occurs, for example, a router exhausts the buffer,
IP discards the excess datagrams and sends ICMP messages to the source. The upper layer
protocols, such as TCP, are responsible for resolving reliability issues. Connectionless
transmission means that IP does not maintain status information for subsequent datagrams.
Every datagram is processed independently, meaning that IP datagrams may not be received
in the same order they are sent. If a source sends two consecutive datagrams A and B in
sequence to the same destination, each datagram is possibly routed over a different path to the
destination, and therefore B may arrive ahead of A.
On an IP network, each host must have an IP address for communication. An IP address is of
32 bits and consists of two parts: network ID and host ID.
The network ID uniquely identifies a network segment or the summarized network
segment of multiple network segments.
The host ID uniquely identifies a specific device on a network segment.
If multiple devices on the same network segment have the same network ID, they belong to
the same network, regardless of their physical locations.
Purpose
IPv4 shields link layer protocol differences and provides a uniform standard for transmission
at the network layer.
1.9.9.2 Principles
1.9.9.2.1 ICMP
The Internet Control Message Protocol (ICMP) is an error-reporting mechanism and is used
by IP or an upper-layer protocol (TCP or UDP). An ICMP message is encapsulated as a part
of an IP datagram and transmitted through the Internet.
An IP datagram contains information about only the source and destination, not about all
nodes along the entire path through which the IP datagram passes. The IP datagram can record
information about all nodes along the path only when route record options are set in the IP
datagram. Therefore, if a device detects an error, it reports the error to the source and not to
intermediate devices.
When an error occurs during the IP datagram forwarding, ICMP reports the error to the source
of the IP datagram, but does not rectify the error or notify the intermediate devices of the error.
A majority of errors generally occur on the source. When an error occurs on an intermediate
device, however, the source cannot locate the device on which the error occurs even after
receiving the error report.
1.9.9.2.2 TCP
The Transmission Control Protocol (TCP) defined in standard protocols ensures
high-reliability transmission between hosts. TCP provides reliable, connection-oriented, and
full-duplex services for user processes. TCP transmits data through sequenced and
nonstructural byte streams.
TCP is an end-to-end, connection-oriented, and reliable protocol. TCP supports multiple
network applications. In addition, TCP assumes that the lower layer provides only unreliable
datagram services, and it can run over a network of different hardware structures.
Figure 1-698 shows the position of TCP in a layered protocol architecture, where TCP is
above IP. TCP can transmit variable-length data through IP encapsulation. IP then performs
data fragmentation and assembly and transmits the data over multiple networks.
TCP works below applications and above IP. Its upper-layer interface consists of a series of
calls similar to the interrupt call of an operating system.
TCP can asynchronously transmit data of upper-layer applications. The lower-layer interfaces
are assumed as IP interfaces. To implement connection-oriented and reliable data transmission
over unreliable networks, TCP must provide the following:
Reliability and flow control functions
Multiple interfaces for upper-layer applications
Data for multiple applications
Connection assurance
Communication security assurance
Figure 1-699 shows the process of setting up and tearing down a TCP connection.
1.9.9.2.3 UDP
The User Datagram Protocol (UDP) is a computer communication protocol that provides
packet switching services on the Internet. By default, UDP uses IP as the lower-layer protocol.
UDP provides the simplest protocol mechanism that sends information to a user application.
UDP is transaction-oriented and does not support delivery or duplicate protection. TCP,
however, is required by applications for reliable data transmission. Figure 1-700 shows the
format of a UDP datagram.
1.9.9.2.4 RawIP
RawIP only fills in certain fields of an IP header and allows an application to provide its own
IP header. Similar to UDP, RawIP is unreliable. No control mechanism is available to verify
whether a RawIP datagram is received. RawIP is connectionless, and it transmits data between
hosts without an electric circuit of any type. Unlike UDP, RawIP allows application data to be
directly processed at the IP layer through a socket. This is helpful to the applications that need
to directly communicate with the IP layer.
1.9.9.2.5 Socket
A socket consists of a set of application programming interfaces (APIs) working between the
transport layer and application layer. The socket shields differences of transport layer
protocols and provides the uniform programming interfaces for the application layer. In this
manner, the application layer, being exempt from the detailed process of the TCP/IP protocol
suite, can transmit data over IP networks by calling socket functions. Figure 1-701 shows the
position of the socket in the TCP/IP protocol stack.
Figure 1-701 Schematic diagram of the socket in the TCP/IP protocol stack
The following types of sockets are supported by different protocols at the transport layer:
TCP-based socket: provides reliable byte-stream communication services for the
application layer.
UDP-based socket: supports connectionless and unreliable data transmission for the
application layer and preserves datagram boundaries.
RawIP socket: also called raw socket. Similar to the UDP-based socket, the RawIP
socket supports connectionless and unreliable data transmission and preserves datagram
boundaries. The RawIP socket is unique in that it can be used by applications to directly
access the network layer.
Link layer-based socket: used by Intermediate System to Intermediate System (IS-IS) to
directly access the link layer.
1.9.9.3 Applications
1.9.10 IPv6
1.9.10.1 Introduction
Definition
Internet Protocol version 6 (IPv6), also called IP Next Generation (IPng), is the
second-generation standard protocol of network layer protocols. As a set of specifications
defined by the Internet Engineering Task Force (IETF), IPv6 is the upgraded version of
Internet Protocol version 4 (IPv4).
The most significant difference between IPv6 and IPv4 is that IP addresses are lengthened
from 32 bits to 128 bits. Featuring a simplified header format, sufficient address space,
hierarchical address structure, flexible extended header, and an enhanced neighbor discovery
(ND) mechanism, IPv6 has a competitive future in the market.
Purpose
IP technology has become widely applied due to the great success of the IPv4 Internet. As the
Internet develops, however, IPv4 weaknesses have become increasingly obvious in the
following aspects:
The IPv4 address space is insufficient.
An IPv4 address is identified using 32 bits. In theory, a maximum of 4.3 billion
addresses can be provided. In actual applications, less than 4.3 billion addresses are
available because of address allocation. In addition, IPv4 address resources are allocated
unevenly. The USA occupies almost half of the global address space, Europe uses fewer
IPv4 addresses, while the Asian-Pacific region uses an even smaller quantity. The
shortage of IPv4 addresses limits further development of mobile IP and bandwidth
technologies that require an increasing number of IP addresses.
There are several solutions to IPv4 address exhaustion. Classless Inter-domain Routing
(CIDR) is one of them. CIDR, however, have their disadvantages, which helped
encourage the development of IPv6.
The backbone router maintains too many routing entries.
In the initial IPv4 allocation planning, many discontinuous IPv4 addresses were allocated,
and therefore routes cannot be aggregated effectively. The constantly growing routing
table consumes significant memory, affecting forwarding efficiency. Subsequently,
device manufacturers have to upgrade routers to improve route addressing and
forwarding performance.
Address auto configuration and readdressing cannot be performed easily.
An IPv4 address only has 32 bits, and IP addresses are allocated unevenly. Consequently,
IP addresses must be reallocated during network expansion or replanning. Address
autoconfiguration and readdressing are required to simplify maintenance.
Security cannot be guaranteed.
As the Internet develops, security issues have become more serious. Security was not
fully considered in designing IPv4. Therefore, the original framework cannot implement
end-to-end security. An IPv6 packet contains a standard extension header related to IP
security (IPsec), which allows IPv6 to provide end-to-end security.
IPv6 solves the problem of IP address shortage and has the following advantages:
Easy to deploy.
Compatible with various applications.
Smooth transition from IPv4 networks to IPv6 networks.
With so many obvious advantages over IPv4, IPv6 has developed rapidly.
1.9.10.2 Principles
IPv6 basic functions include IPv6 neighbor discovery and path MTU (PMTU) discovery.
Neighbor discovery and PMTU discovery are implemented using Internet Control Message
Protocol for IPv6 (ICMPv6) messages.
X:X:X:X:X:X:X:X
− IPv6 addresses in this format are written as eight groups of four hexadecimal digits
(0 to 9, A to F), each group separated by a colon (:). Every "X" represents a group
of hexadecimal digits. For example, 2031:0000:130F:0000:0000:09C0:876A:130B
is a valid IPv6 address.
For convenience, any zeros at the beginning of a group can be omitted; therefore,
the given example becomes 2031:0:130F:0:0:9C0:876A:130B.
− Any number of consecutive groups of 0s can be replaced with two colons (::).
Therefore, the given example can be written as 2031:0:130F::9C0:876A:130B.
This double-colon substitution can only be used once in an address; multiple
occurrences would be ambiguous.
X:X:X:X:X:X:d.d.d.d
IPv4-mapped IPv6 address: The format of an IPv4-mapped IPv6 address is
0:0:0:0:0:FFFF:IPv4-address. IPv4-mapped IPv6 addresses are used to represent IPv4
node addresses as IPv6 addresses.
"X:X:X:X:X:X" represents the high-order six groups of digits, each "X" standing for 16
bits represented by hexadecimal digits. "d.d.d.d" represents the low-order four groups of
digits, each "d" standing for 8 bits represented by decimal digits. "d.d.d.d" is a standard
IPv4 address.
Application scenario: When a mobile host communicates with the mobile agent on the
home subnet, it uses the anycast address of the subnet router.
Addresses specifications: Anycast addresses do not have independent address space.
They can use the format of any unicast address. A syntax is required to differentiate an
anycast address from a unicast address.
Multicast address: assigned to a set of interfaces that belong to different nodes and is
similar to an IPv4 multicast address. A packet that is sent to a multicast address is
delivered to all the interfaces identified by that address.
IPv6 addresses do not include broadcast addresses. In IPv6, multicast addresses can
provide the functions of broadcast addresses.
Unicast addresses can be classified into four types, as shown in Table 1-176.
Global unicast address: equivalent to an IPv4 public network address. Global unicast
addresses are used on links that can be aggregated, and are provided to the Internet
Service Provider (ISP). The structure of this type of address enables route-prefix
aggregation to solve the problem of a limited number of global routing entries. A global
unicast address consists of a 48-bit route prefix managed by operators, a 16-bit subnet ID
managed by local nodes, and a 64-bit interface ID. Unless otherwise specified, global
unicast addresses include site-local unicast addresses.
When network administrators need to specify or plan source and destination addresses of
packets, they can define a group of address selection rules. An address selection policy
table can be created based on these rules. Similar to a routing table, this table is queried
based on the longest matching rule. The address is selected based on the source and
destination addresses.
Select a source address using the following rules in descending order of priority:
a. Prefer a source address that is the same as the destination address.
b. Prefer an address in an appropriate address range.
c. Avoid selecting a deprecated address.
d. Prefer a home address.
e. Prefer an address of the outbound interface.
f. Prefer an address whose label value is the same as that of the destination address.
g. Use the longest matching rule.
The candidate address can be the unicast address that is configured on the specified outbound interface.
If a source address that has the same label value and is in the same address range with the destination
address is not found on the outbound interface, you can select such a source address from another
interface.
Select a destination address using the following rules in descending order of priority.
a. Avoid selecting an unavailable destination address.
b. Prefer an address in an appropriate address range.
c. Avoid selecting a deprecated address.
d. Prefer a home address.
e. Prefer an address whose label value is the same as that of the source address.
f. Prefer an address with a higher precedence value.
g. Prefer native transport to the 6over4 or 6to4 tunnel.
h. Prefer an address in a smaller address range.
i. Use the longest matching rule.
j. Leave the order of address priorities unchanged.
QoS
In an IPv6 header, the new Flow Label field specifies how to identify and process traffic.
The Flow Label field identifies a flow and allows a router to recognize packets in the
flow and to provide special processing.
QoS is guaranteed even for the packets encrypted with IPsec because the IPv6 header
can identify different types of flows.
Built-in security
An IPv6 packet contains a standard extension header related to IPsec, and therefore IPv6
can provide end-to-end security. This provides network security specifications and
improves interoperability between different IPv6 applications.
Fixed basic header
A fixed basic header helps improve forwarding efficiency.
Flexible extension header
An IPv4 header only supports the 40-byte Options field, whereas the size of the IPv6
extension header is limited only by the IPv6 packet size.
In IPv6, multiple extension headers are introduced to replace the Options field of the
IPv4 header. This improves packet processing efficiency, enhances IPv6 flexibility, and
provides better scalability for the IP protocol. Figure 1-704 shows an IPv6 extension
header.
When multiple extension headers are used in the same packet, the headers must be listed in
the following order:
IPv6 basic header
Hop-by-hop extension header
Destination options extension header
Routing extension header
Fragment extension header
Authentication extension header
Encapsulation security extension header
Destination options extension header (options to be processed at the destination)
Upper layer extension header
Not all extension headers must be examined and processed by routers. When a router
forwards packets, it determines whether or not to process the extension headers based on the
Next Header value in the IPv6 basic header.
The destination options extension header appears twice in a packet: one before the routing
extension header and one after the upper layer extension header. All other extension headers
appear only once.
1.9.10.2.3 ICMPv6
Internet Control Message Protocol for IPv6 (ICMPv6) is a basic IPv6 protocol that uses error
or informational messages to report errors and information generated during packet
processing. Figure 1-705 shows the ICMPv6 message format.
Neighbor Discovery
Similar to ARP in IPv4, IPv6 ND parses the neighbor addresses and detects the availability of
neighbors based on NS and NA messages.
When a node needs to obtain the link-layer address of another node on the same local link, it
sends an ICMPv6 type 135 NS message. An NS message is similar to an ARP request
message in IPv4, but is destined for a multicast address rather than a broadcast address. Only
the node whose last 24 bits in its address are the same as the multicast address can receive the
NS message. This reduces the possibility of broadcast storms. A destination node fills its
link-layer address in the NA message.
An NS message is also used to detect the availability of a neighbor when the link-layer
address of the neighbor is known. An NA message is the response to an NS message. After
receiving an NS message, a destination node responds with an ICMPv6 type 136 NA message
on the local link. After receiving the NA message, the source node can communicate with the
destination node. When the link-layer address of a node on the local link changes, the node
actively sends an NA message.
Router Discovery
Router discovery is used to locate a neighboring router and learn the address prefix and
configuration parameters related to address autoconfiguration. IPv6 router discovery is
implemented based on the following messages:
RS message
When a host is not configured with a unicast address, for example, when the system has
just started, it sends an RS message. An RS message helps the host rapidly perform
address autoconfiguration without waiting for the RA message that is periodically sent
by an IPv6 device. An RS message is of the ICMPv6 type 133.
RA message
Interfaces on each IPv6 device periodically send RA messages only when they are
enabled to do so. After a router receives an RS message from an IPv6 device on the local
link, the router responds with an RA message. An RA message is sent to the all-nodes
multicast address (FF02::1) or to the IPv6 unicast address of the node that sent the RS
message. An RA message is of the ICMPv6 type 134 and contains the following
information:
− Whether or not to use address autoconfiguration
− Supported autoconfiguration type: stateless or stateful
− One or more on-link prefixes (On-link nodes can perform address autoconfiguration
using these address prefixes.)
− Lifetime of the advertised on-link prefixes
− Whether the router sending the RA message can be used as a default router (If so,
the lifetime of the default router is also included, expressed in seconds.)
− Other information about the host, such as the hop limit and the MTU that specifies
the maximum size of the packet initiated by a host
After an IPv6 host on the local link receives an RA message, it extracts the preceding
information to obtain the updated default router list, prefix list, and other configurations.
Address Autoconfiguration
A router can notify hosts of how to perform address autoconfiguration using RA messages and
prefix flags. For example, the router can specify stateful or stateless address autoconfiguration
for the hosts.
When stateless address autoconfiguration is employed, a host uses the prefix information in a
received RA message and local interface ID to automatically form an IPv6 address, and sets
the default router according to the default router information in the message.
To counter these threats, standard protocols specifies security mechanisms to extend ND.
Standard protocols define Cryptographically Generated Addresses (CGAs), CGA option, and
Rivest Shamir Adleman (RSA) Signature option, which are used to ensure that the sender of
an ND message is the owner of the message's source address. Standard protocols also define
Timestamp and Nonce options to prevent replay attacks.
CGA: contains an IPv6 interface identifier that is generated from a one-way hash of the
public key and associated parameters.
CGA option: contains information used to verify the sender's CGA, including the public
key of the sender. CGA is used to authenticate the validity of source IP addresses carried
in ND messages.
RSA option: contains the hash value of the sender's public key and contains the digital
signature generated from the sender's private key and ND messages. RSA is used to
authenticate the completeness of ND messages and the identity of the ND message
sender.
For an attacker to use an address that belongs to an authorized node, the attacker must use the public key
of the authorized node for encryption. Otherwise, the receiver can detect the attempted attack after
checking the CGA option. Even if the attacker obtains the public key of the authorized node, the receiver
can still detect the attempted attack after checking the digital signature, which is generated from the
sender's private key.
Timestamp option: a 64-bit unsigned integer field containing a timestamp. The value
indicates the number of seconds since January 1, 1970, 00:00 UTC. This option protects
non-solicit notification messages and Redirect messages and ensures that the timestamp
of the recently received message is the latest.
Nonce option: contains a random number selected by the sender of a solicitation message.
This option prevents replay attacks during message exchange. For example, a sender
sends an NS message carrying the Nonce option and receives an NA message as a
response that also carries the Nonce option; the sender verifies the NA message based on
the Nonce option.
To reject insecure ND messages, an interface can have the IPv6 SEND function configured.
An ND message that meets any of the following conditions is insecure:
The received ND message does not carry the CGA or RSA option, which indicates that
the interface sending this message is not configured with a CGA.
The key length of the received ND message exceeds the length limit that the interface
supports.
The rate at which ND messages are received exceeds the system rate limit.
The time difference between the sent and received ND messages exceeds the time
difference allowed by the interface.
As router implementation complies with Standard protocols, the key-hash field in the RSA signature
option of ND packets is generated using the SHA-1 algorithm. SHA-1 has been proved not secure
enough.
IPv6 packets, which reduces transmission efficiency. If the source node uses the minimum
IPv6 MTU of 1280 bytes as the maximum fragment length, in most cases, the PMTU is
greater than the minimum IPv6 MTU of the link, and the fragments sent by a node are always
smaller than the PMTU. As a result, network resources are wasted. To resolves this problem,
the PMTU discovery mechanism is introduced.
PMTU Principles
PMTU is the process of determining the minimum IPv6 MTU on the path between the source
and destination. The PMTU discovery mechanism uses a technique to dynamically discover
the PMTU for a path. When an IPv6 node has a large amount of data to send to another node,
the data is transmitted in a series of IPv6 fragments. When these fragments are of the
maximum length allowed in successful transmission from the source node to the destination
node, the fragment length is considered optimal and called PMTU.
A source node assumes that the PMTU of a path is the known IPv6 MTU of the first hop on
the path. If any of the packets sent on that path are too large to be forwarded, the transit node
discards these packets and returns an ICMPv6 Datagram Too Big message to the source node.
The source node sets the PMTU for the path based on the IPv6 MTU in the received message.
When the PMTU learned by the transit node is less than or equal to the actual PMTU, the
PMTU discovery process is complete. Before the PMTU discovery process is completed,
ICMPv6 Datagram Too Big messages may be repeatedly sent or received because there may
be links with smaller MTUs further along the path.
Multiple applications, such as DNS, FTP, and Telnet, support the dual stack. The upper
layer applications, such as the DNS, use TCP or UDP as the transmission layer protocol
and prefer the IPv6 protocol stack rather than the IPv4 protocol stack as the network
layer protocol.
1. On the border router, IPv4/IPv6 dual stack is enabled, and an IPv6 over IPv4 tunnel is
configured.
2. After the border router receives a packet from the IPv6 network, if the destination
address of the packet is not the border router and the outbound interface of the next hop
is a tunnel interface, the border router appends an IPv4 header to the IPv6 packet to
encapsulate it as an IPv4 packet.
3. On the IPv4 network, the encapsulated packet is transmitted to the remote border router.
4. The remote border router receives the packet, removes the IPv4 header, and then sends
the decapsulated IPv6 packet to the remote IPv6 network.
IPv6 over IPv4 tunnels are classified into IPv6 over IPv4 manual tunnels and
IPv6-to-IPv4 (6to4) tunnels in different application scenarios.
The following describes the characteristics and applications of each.
IPv6-to-IPv4 Tunnel
A 6to4 tunnel can connect multiple isolated IPv6 sites through an IPv4 network. A 6to4 tunnel
can be a P2MP connection, whereas a manual tunnel is a P2P connection. Therefore, routers
on both ends of the 6to4 tunnel are not configured in pairs.
A 6to4 tunnel uses a special IPv6 address, a 6to4 address in the format of 2002:IPv4
address:subnet ID:interface ID. A 6to4 address has a 48-bit prefix composed of 2002:IPv4
address. The IPv4 address is the globally unique IPv4 address applied by an isolated IPv6 site.
This IPv4 address must be configured on the physical interfaces connecting the border routers
between IPv6 and IPv4 networks to the IPv4 network. The IPv6 address has a 16-bit subnet
ID and a 64-bit interface ID, which are assigned by users in the isolated IPv6 site.
When the 6to4 tunnel is used for communication between the 6to4 network and the native
IPv6 network, you can configure an anycast address with the prefix 2002:c058:6301/48 on
the tunnel interface of the 6to4 relay route device.
The difference between a 6to4 address and anycast address is as follows:
If a 6to4 address is used, you must configure different addresses for tunnel interfaces of
all devices.
If an anycast address is used, you must configure the same address for the tunnel
interfaces of all devices, effectively reducing the number of addresses.
A 6to4 network refers to a network on which all nodes are configured with 6to4 addresses. A
native IPv6 network refers to a network on which nodes do not need to be configured with
6to4 addresses. A 6to4 relay is required for communication between 6to4 networks and native
IPv6 networks.
1.9.10.2.8 TCP6
Transmission Control Protocol version 6 (TCP6) provides a mechanism to establish virtual
circuits between processes of two endpoints. A TCP6 virtual circuit is similar to the
full-duplex circuit that transmits data between systems. TCP 6 provides reliable data
transmission between processes, and is known as a reliable protocol. TCP6 also provides a
mechanism to optimize transmission performance according to the network status. When all
data can be received and acknowledged, the transmission rate increases gradually. Delay
causes the sending host to reduce the sending rate before it receives Acknowledgement
packets.
TCP6 is generally used in interactive applications, such as the web application. Certain errors
in data receiving affect the normal operation of devices. TCP6 establishes virtual circuits
using the three-way handshake mechanism, and all virtual circuits are deleted through the
four-way handshake. TCP6 connections provide multiple checksums and reliability functions,
but increase cost. As a result, TCP6 has lower efficiency than User Datagram Protocol version
6 (UDP6).
Figure 1-709 shows the establishment and tearing down of a TCP6 connection.
1.9.10.2.9 UDP6
User Datagram Protocol version 6 (UDP6) is a computer communications protocol used to
exchange packets on a network. UDP6 has the following characteristics:
UDP only uses source and destination information and is mainly used in the simple
request/response structure.
UDP is unreliable. This is because no control mechanism is available to ensure that
UDP6 datagrams reach their destinations.
UDP is connectionless, meaning that no virtual circuits are required during data
transmission between hosts.
The connectionless feature of UDP6 enables it to send data to multicast addresses. This is
different from TCP6, which requires specific source and destination addresses.
1.9.10.2.10 RawIP6
RawIP6 fills only a limited number of fields in the IPv6 header, and allows application
programs to provide their own IPv6 headers.
RawIP6 is similar to UDP6 in the following aspects:
1.10 IP Routing
1.10.1 About This Document
Purpose
This document describes the IP Routing feature in terms of its overview, principles, and
applications.
Related Version
The following table lists the product version related to this document.
Intended Audience
This document is intended for:
Network planning engineers
Commissioning engineers
Data configuration engineers
System maintenance engineers
Security Declaration
Encryption algorithm declaration
The encryption algorithms DES/3DES/SKIPJACK/RC2/RSA (RSA-1024 or
lower)/MD2/MD4/MD5 (in digital signature scenarios and password encryption)/SHA1
(in digital signature scenarios) have a low security, which may bring security risks. If
protocols allowed, using more secure encryption algorithms, such as AES/RSA
(RSA-2048 or higher)/SHA2/HMAC-SHA2 is recommended.
Password configuration declaration
− Do not set both the start and end characters of a password to "%^%#". This causes
the password to be displayed directly in the configuration file.
− To further improve device security, periodically change the password.
Special Declaration
This document serves only as a guide. The content is written based on device
information gathered under lab conditions. The content provided by this document is
intended to be taken as general guidance, and does not cover all scenarios. The content
provided by this document may be different from the information on user device
interfaces due to factors such as version upgrades and differences in device models,
board restrictions, and configuration files. The actual user device information takes
precedence over the content provided by this document. The preceding differences are
beyond the scope of this document.
The maximum values provided in this document are obtained in specific lab
environments (for example, only a certain type of board or protocol is configured on a
tested device). The actually obtained maximum values may be different from the
maximum values provided in this document due to factors such as differences in
hardware configurations and carried services.
Interface numbers used in this document are examples. Use the existing interface
numbers on devices for configuration.
The pictures of hardware in this document are for reference only.
Symbol Conventions
The symbols that may be found in this document are defined as follows.
Symbol Description
Indicates an imminently hazardous situation which, if not
avoided, will result in death or serious injury.
Change History
Updates between document issues are cumulative. Therefore, the latest document issue
contains all updates made in previous issues.
Changes in Issue 03 (2017-09-20)
This issue is the third official release. The software version of this issue is
V800R009C10SPC200.
Changes in Issue 02 (2017-07-30)
This issue is the second official release. The software version of this issue is
V800R009C10SPC100.
Changes in Issue 01 (2017-05-30)
This issue is the first official release. The software version of this issue is
V800R009C10.
Definition
As a basic concept on data communication networks, routing is the process of packet relaying
or forwarding, and the process provides route information for packet forwarding.
Purpose
During data forwarding, routers, routing tables, and routing protocols are indispensable.
Routing protocols are used to discover routes and contribute to the generation of routing
tables. Routing tables store the routes discovered by various routing protocols, and routers
select routes and implement data forwarding.
1.10.2.2 Principles
1.10.2.2.1 Routers
On the Internet, network connection devices control network traffic and ensure data
transmission quality on networks. Common network connection devices include hubs, bridges,
switches, and routers.
As a standard network connection device, a router is used to select routes and forward packets.
Based on the destination address in the received packet, a router selects a path to send the
packet to the next router. The last router is responsible for sending the packet to the
destination host. In addition, a router can select an optimal path for data transmission.
For example, in Figure 1-710, traffic from Host A to Host C needs to pass through three
networks and two routers. The hop count from a router to its directly connected network is
zero. The hop count from a router to a network that the router can reach through another
router is one. The rest can be deduced by analogy. If a router is connected to another router
through a network, a network segment exists between the two routers, and they are considered
adjacent on the Internet. In Figure 1-710, the bold arrows indicate network segments. The
routers do not need to know about the physical link composition of each network segment.
Network sizes may vary greatly, and the actual lengths of network segments vary as well.
Therefore, you can set a weighted coefficient for the network segments of each network and
then measure the cost of a route based on the number of network segments.
A route with the minimal network segments is not necessarily optimal. For example, a route
passing through three high-speed Local Area Network (LAN) network segments may be a
better choice than one passing through two low-speed Wide Area Network (WAN) network
segments.
Each router that supports Layer 3 Virtual Private Network (L3VPN) maintains a management routing
table (local core routing table) for each VPN instance.
The Preference is used during the selection of routes discovered by different routing protocols, whereas
the Cost is used during the selection of routes discovered by the same routing protocol.
Flags:
Route flag:
− R: indicates an iterated route.
− D: indicates a route that is downloaded to the FIB.
− T: indicates a route whose next hop belongs to a VPN instance.
− B: indicates a black-hole rout
Next hop: indicates the IP address of the next router through which an IP packet passes.
Interface: indicates the outbound interface that forwards an IP packet.
Based on the destination addresses, routes can be classified into the following types:
Network segment route: The destination is a network segment.
Host route: The destination is a host.
In addition, based on whether the destination is directly connected to the router, route types
are as follows:
Route Priority
Dynamic routes of different routing protocols and static routes may have the same destination,
but not all these routes are optimal. At a certain moment, only one routing protocol determines
the current route to a certain destination. To select the optimal route, each routing protocol
and static route are configured with priorities, and the route with the highest priority becomes
the optimal route. Table 1-178 lists routing protocols and their default priorities.
In Table 1-178, 0 indicates a direct route, and 255 indicates any route learned from unreliable
sources. The smaller the value, the higher the priority.
Direct 0
OSPF 10
IS-IS 15
Static 60
RIP 100
OSPF ASE 150
OSPF NSSA 150
IBGP 255
EBGP 255
The priorities of routing protocols can be configured, except for direct routes. In addition, the
priority of each static route can be different.
The NE20E defines external and internal priorities. The external priority refers to the priority
set by users for each routing protocol. Table 1-178 lists the default external priorities.
When different routing protocols are configured with the same priority, the system selects the
optimal route based on the internal priority. For the internal priority of each routing protocol,
see Table 1-179.
Direct 0
OSPF 10
IS-IS Level-1 15
IS-IS Level-2 15
Static 60
RIP 100
OSPF ASE 150
For example, two routes, an OSPF route and a static route, can reach 10.1.1.0/24, and the
priorities of the two routes are set to 5. In this case, the NE20E selects the optimal route based
on the internal priorities listed in Table 1-179. The internal priority of OSPF (10) is higher
than that of the static route (60). Therefore, the system selects the route discovered by OSPF
as the optimal route.
Definition
Priority-based route convergence is an important technology that improves network reliability.
It provides faster route convergence for key services. For example, to minimize the
interruption of key services in case of network faults, real-time multicast services require that
the routes to the multicast source quickly converge, and the Multiprotocol Label Switching
(MPLS) VPN bearer network requires that routes between PEs also quickly converge.
Convergence priorities provide references for the system to converge routes for service
forwarding. Different routes can be set with different convergence priorities, which can be
identified as critical, high, medium, and low listed in descending order.
Purpose
With the integration of network services, requirements on service differentiation increase.
Carriers require that the routes for key services, such as Voice over IP (VoIP) and video
conferencing services converge faster than those for common services. Therefore, routes need
to converge based on their convergence priorities to improve network reliability.
For VPN route priorities, only 32-bit host routes of OSPF and IS-IS are identified as medium, and the
other routes are identified as low.
Applications
Figure 1-712 shows networking for multicast services. An IGP runs on the network; Device A
is the receiver, and Device B is the multicast source server with IP address 10.10.10.10/32.
The route to the multicast source server is required to converge faster than other routes, such
as 12.10.10.0/24. In this case, you can set a higher convergence priority for 10.10.10.10/32
than that of 12.10.10.0/24. Then, when routes converge on the network, the route to the
multicast source server 10.10.10.10/32 converges first, ensuring the transmission of multicast
services.
Load Balancing
The NE20E supports the multi-route model (multiple routes with the same destination and
priority). Routes discovered by one routing protocol with the same destination and cost can
load-balance traffic. In each routing protocol view, you can run the maximum
load-balancing number command to perform load balancing. Load balancing can work
per-destination or per-packet.
The number of equal-cost routes for load balancing varies with products.
Route Backup
The NE20E supports route backup to improve network reliability. You can configure multiple
routes to the same destination as required. The route with the highest priority functions as the
primary route, and the other routes with lower priorities function as backup routes.
In most cases, the NE20E uses the primary route to forward packets. If the link fails, the
primary route becomes inactive. The NE20E then selects a backup route with the highest
priority to forward packets, and the primary route is switched to the backup route. When the
original primary route recovers, the NE20E restores and reselects the optimal route. Because
the original primary route has the highest priority, the NE20E selects this route to send
packets. Therefore, the backup route is switched to the primary route.
Overview
Fast Reroute (FRR) functions when the lower layer (physical layer or data link layer) detects a
fault. The lower layer reports the fault to the upper layer routing system and immediately
forwards packets through a backup link.
If a link fails, FRR helps reduce the impact of the link failure on services transmitted on the
link.
Background
On traditional IP networks, when a fault occurs at the lower layer of the forwarding link, the
physical interface on the router goes Down. After the router detects the fault, it instructs the
upper layer routing system to recalculate routes and then update routing information. The
routing system takes several seconds to reselect an available route.
For services that are sensitive to packet loss and delay, a convergence time of several seconds
is intolerable because it may lead to service interruptions. For example, the maximum
convergence time tolerable for Voice over IP (VoIP) services is within milliseconds. IP FRR
enables the forwarding system to detect a fault and then to take measures to restore services as
soon as possible.
The static routes that are imported between public and private networks do not support IP FRR.
If the forwarding engine detects that the primary link is unavailable after IP FRR between
different protocols is enabled, the system can use the backup link to forward traffic before the
routes converge on the control plane.
Feature Description
IP FRR Implements FRR through a backed up route. IP FRR is applicable to
networks where a master link and a backup link exist and load balancing
is not configured.
Load Implements fast route switching through equal-cost routes and applies to
balancing the multi-link networking with load balancing.
Definition
Indirect next hop is a technique used to speed up route convergence. This technique can
change the direct association between route prefixes and next hop information into an indirect
association. Indirect next hop allows next hop information to be refreshed independently of
the prefixes of the same next hop, which speeds up route convergence.
Purpose
In the scenario requiring route iteration, when IGP routes or tunnels are switched, forwarding
entries are rapidly refreshed, which implements fast route convergence and reduces the impact
of route or tunnel switching on services.
Iteration Policy
An iteration policy is used to control the iteration result of the next hop to meet requirements
of different scenarios. In route iteration, behaviors do not need to be controlled by the
iteration policy. Instead, iteration behaviors only need to comply with the longest match rule.
In addition, the iteration policy needs to be applied only when VPN routes are iterated to
tunnels.
By default, the system selects Label Switched Paths (LSPs) for VPNs without performing
load balancing. If load balancing or other types of tunnels are required, configure a tunnel
policy and bind it to a tunnel. After the tunnel policy is applied, the system uses the tunnel
bound to the tunnel policy or selects a tunnel based on the priorities specified in the tunnel
policy during next hop iteration.
As shown in Figure 1-714, without indirect next hop, prefixes are totally independent, each
corresponding to its next hop and forwarding information. When a dependent route changes,
the next hop corresponding to each prefix is iterated and forwarding information is updated
based on the prefix. In this case, the convergence time is decided by the number of prefixes.
Note that prefixes of a BGP peer have the same next hop, forwarding information, and
refreshed forwarding information.
As shown in Figure 1-715, with indirect next hop, prefixes of routes from the same BGP peer
share the same next hop. When a dependent route changes, only the shared next hop is
iterated and forwarding information is updated based on the next hop. In this case, routes of
all prefixes can converge at a time. Therefore, the convergence time is irrelevant to the
number of prefixes.
In Figure 1-716, an IBGP peer relationship is established between Device A and Device D.
The IBGP peer relationship is established between two loopback interfaces on the routers, but
the next hop cannot be used to guide packet forwarding, because it is not directly reachable.
Therefore, to refresh the forwarding table and guide packet forwarding, the system needs to
search for the actual outbound interface and directly connected next hop based on the original
IBGP next hop.
Device D receives 100,000 routes from Device A. These routes have the same original BGP
next hop. After being iterated, these routes eventually follow the same IGP path (A->B->D).
If the IGP path (A->B->D) fails, these IBGP routes do not need to be iterated separately, and
the relevant forwarding entries do not need to be refreshed one by one. Note that only the
shared next hop needs to be iterated and refreshed. Consequently, these IBGP routes converge
to the path (A->C->D) on the forwarding plane. Therefore, the convergence time depends on
only the number of next hops, not the number of prefixes.
If Device A and Device D establish a multi-hop EBGP peer relationship, the convergence
procedure is the same as the preceding one. Indirect next hop also applies to the iteration of a
multi-hop EBGP route.
In Figure 1-717, a neighbor relationship is established between PE1 and PE2, and PE2
receives 100,000 VPN routes from PE1. These routes have the same original BGP next hop.
After being iterated, these VPN routes eventually follow the same public network tunnel
(tunnel 1). If tunnel 1 fails, these routes do not need to be iterated separately, and the relevant
forwarding entries do not need to be refreshed one by one. Note that only the shared next hop
needs to be iterated, and the relevant forwarding entries need to be refreshed. Consequently,
these VPN routes converge to tunnel 2 on the forwarding plane. In this manner, the
convergence time depends on only the number of next hops, not the number of prefixes.
1.10.2.2.14 Multi-Topology
Multi-Topology Overview
On a traditional IP network, only one unicast topology exists, and only one unicast forwarding
table is available on the forwarding plane, which forces services transmitted from one router
to the same destination address to share the same next hop, and various end-to-end services,
such as voice and data services, to share the same physical links. As a result, some links may
become heavily congested while others remain relatively idle. To address this problem,
configure multi-topology to divide a physical network into different logical topologies for
different services.
By default, the base topology is created on the public network. The class-specific topology
can be added or deleted in the public network address family view. Each topology contains its
own routing table. The class-specific topology supports the addition, deletion, and import of
protocol routes.
The base topology cannot be deleted.
Background
A VRRP backup group is configured on Device1 and Device2 on the network shown in Figure
1-718. Device1 is a master device, whereas Device2 is a backup device. The VRRP backup
group serves as a gateway for users. User-to-network traffic travels through Device1.
However, network-to-user traffic may travel through Device1, Device2, or both of them over
a path determined by a dynamic routing protocol. Therefore, user-to-network traffic and
network-to-user traffic may travel along different paths, which interrupts services if firewalls
are attached to devices in the VRRP backup group, complicates traffic monitoring or statistics
collection, and increases costs.
To address the preceding problems, the routing protocol is expected to select a route passing
through the master device so that the user-to-network and network-to-user traffic travels along
the same path. Association between direct routes and a VRRP backup group can meet
expectations by allowing the dynamic routing protocol to select a route based on the VRRP
status.
Figure 1-718 Association between direct routes and a VRRP backup group
Related Concepts
VRRP is a widely used fault-tolerant protocol that groups multiple routing devices into a
backup group, improving network reliability. A VRRP backup group consists of a master
device and one or more backup devices. If the master device fails, the VRRP backup group
switches services to a backup device to ensure communication continuity and reliability.
A device in a VRRP backup group operates in one of three states:
Master: If a network is working correctly, the master device transmits all services.
Backup: If the master device fails, the VRRP backup group selects a backup device as
the new master device to take over traffic and ensure uninterrupted service transmissions.
Initialize: A device in the Initialize state is waiting for an interface Startup message to
switch its status to Master or Backup.
For details about VRRP, see HUAWEI NE20E-S2 Universal Service Router Feature Description -
Network Reliability - VRRP.
Implementation
Association between direct routes and a VRRP backup group allows VRRP interfaces to
adjust the costs of direct network segment routes based on the VRRP status. The direct route
with the master device as the next hop has the lowest cost. A dynamic routing protocol
imports the direct routes and selects the direct route with the lowest cost. For example, VRRP
interfaces on Device1 and Device2 on the network shown in Figure 1-718 are configured with
association between direct routes and the VRRP backup group. The implementation is as
follows:
Device1 in the Master state sets the cost of its route to the directly connected virtual IP
network segment to 0 (default value).
Device2 in the Backup state increases the cost of its route to the directly connected
virtual IP network segment.
A dynamic routing protocol selects the route with Device1 as the next hop because this route
costs less than the other route. Therefore, both user-to-network and network-to-user traffic
travels through Device1.
Usage Scenario
When a data center is used, firewalls are attached to devices in a VRRP backup group to
improve network security. Network-to-user traffic cannot pass through a firewall if it travels
over a path different than the one used by user-to-network traffic.
When an IP radio access network (RAN) is configured, VRRP is configured to set the
master/backup status of aggregation site gateways (ASGs) and radio service gateways (RSGs).
Network-to-user and user-to-network traffic may pass through different paths, complicating
network operation and management.
Association between direct routes and a VRRP backup group can address the preceding
problems by ensuring the user-to-network and network-to-user traffic travels along the same
path.
1.10.2.2.16 Direct Routes Responding to L3VE Interface Status Changes After a Delay
Background
In Figure 1-719, a Layer 2 virtual private network (VPN) connection is set up between each
AGG and the CSG through L2 virtual Ethernet (VE) interfaces, and BGP VPNv4 peer
relationships are set up between the AGGs and RSGs on an L3VPN. L3VE interfaces are
configured on the AGGs, and VPN instances are bound to the L3VE interfaces so that the
CSG can access the L3VPN. BGP is configured on the AGGs to import direct routes between
the CSG and AGGs. The AGGs convert these direct routes to BGP VPNv4 routes before
advertising them to the RSGs.
AGG1 functions as the master device in Figure 1-719. In most cases, the RSGs select routes
advertised by AGG1, and traffic travels along Link A. If AGG1 or the CSG-AGG1 link fails,
traffic switches over to Link B. After AGG1 or the CSG-AGG1 link recovers, the L3VE
interface on AGG1 goes from Down to Up, and AGG1 immediately generates a direct route
destined for the CSG and advertises the route to the RSGs. Downstream traffic then switches
over to Link A. However, AGG1 has not learned the MAC address of the NodeB yet. As a
result, downstream traffic is lost.
To address this problem, configure the direct route to respond to L3VE interface status
changes after a delay. After you configure the delay, the RSG preferentially selects routes
advertised by AGG1 only after AGG1 learns the MAC address of the NodeB.
Figure 1-719 Networking for the direct route responding to L3VE interface status changes after a
delay
Implementation
After you configure the direct route to respond to L3VE interface status changes after a delay,
the cost of the direct route between the CSG and AGG1 is modified to the configured cost
(greater than 0) when the L3VE interface on AGG1 goes from Down to Up. After the
configured delay expires, the cost of the direct route to the CSG restores to the default value 0.
Because BGP has imported the direct route and has advertised it to RSGs, the cost value
determines whether RSGs preferentially select the direct route.
RSGs preferentially transmit traffic over Link B before AGG1 has learned the MAC address
of the NodeB, which reduces traffic loss.
Usage Scenario
This feature applies to IP radio access networks (RANs) on which an L2VPN accesses an
L3VPN.
Background
In Figure 1-720, PWs are set up between the AGGs and the CSG. BGP virtual private network
version 4 (VPNv4) peer relationships are set up between the AGGs and RSGs. Layer 3 virtual
Ethernet (L3VE) interfaces are configured on the AGGs, and VPN instances are bound to the
L3VE interfaces so that the CSG can access the L3VPN. BGP is configured on the AGGs to
import direct routes between the CSG and AGGs. The AGGs convert these direct routes to
BGP VPNv4 routes before advertising them to the RSGs.
AGG1 functions as the master device in Figure 1-720. In most cases, the RSGs select routes
advertised by AGG1, and traffic travels along Link A. If AGG1 or the CSG-AGG1 link fails,
traffic switches over to Link B. After AGG1 or the CSG-AGG1 link recovers, the L3VE
interface on AGG1 goes from Down to Up, and AGG1 immediately generates a direct route
destined for the CSG and advertises the route to the RSGs. Downstream traffic then switches
over to Link A. However, PW1 is on standby. As a result, downstream traffic is lost.
To address this problem, associate the direct route and PW status. After the association is
configured, the RSG preferentially selects the direct route only after PW1 becomes active.
Figure 1-720 Networking for the association between the direct route and PW status
Implementation
Configuring the association between the direct route and PW status allows a VE interface to
adjust the cost value of the direct route based on PW status. The cost value determines
whether the RSGs preferentially select the direct route because BGP has imported the direct
route and has advertised it to RSGs. For example, if you associate the direct route and PW
status on the network shown in Figure 1-720, the implementation is as follow:
When PW1 becomes active, the cost value of the direct route between the CSG and
AGG1 restores to the default value 0. RSGs preferentially transmit traffic over Link A.
When PW1 is on standby, the cost value of the direct route between the CSG and AGG1
is modified to a configured value (greater than 0). RSGs preferentially transmit traffic
over Link B, which reduces traffic loss.
Usage Scenario
This feature applies to IP radio access networks (RANs) on which primary/secondary PWs are
configured between the CSG and AGGs.
Background
By default, IPv4 Address Resolution Protocol (ARP) Vlink direct routes or IPv6 Neighbor
Discovery Protocol (NDP) Vlink direct routes are only used for packet forwarding in the same
VLAN and cannot be imported to dynamic routing protocols. This is because importing Vlink
direct routes to dynamic routing protocols will increase the number of routing entries and
affect routing table stability. In some cases, some operations need to be performed based on
Vlink direct routes of VLAN users. For example, different VLAN users use different route
exporting policies to guide traffic from the remote device. In this scenario, ARP or NDP Vlink
direct routes are needed to be imported by a dynamic routing protocol and advertised to the
remote device. After advertisement of ARP or NDP Vlink direct routes is enabled, these direct
routes can be imported by a dynamic routing protocol (IGP or BGP) and advertised to the
remote device.
Related Concepts
ARP Vlink direct routes: routing entries with physical interfaces of VLAN users and used to
forward IP packets. These physical interfaces are learned using ARP. On networks with
VLANs, IP packets can be forwarded only by physical interfaces rather than VLANIF
interfaces because VLANIF interfaces are logical interfaces that consist of multiple physical
interfaces.
NDP Vlink direct routes: routing entries carrying IPv6 addresses of VLAN users' physical
interfaces. These IPv6 addresses are learned and resolved using NDP.
Implementation
On the network shown in Figure 1-721, Device A, Device B, and Device C are connected to
the VLANIF interface of Device D which is a Border Gateway Protocol (BGP) peer of Device
E. However, Device E needs to communicate only with Device B rather than Device A and
Device C. In this scenario, Vlink direct route advertisement must be enabled on Device D.
Then Device D obtains each physical interface of Device A, Device B, and Device C, uses a
routing policy to filter out network segment routes and routes destined for Device A and
Device C, and advertises the route destined for Device B to Device E.
Usage Scenario
Vlink direct route advertisement is applicable to networks in which a device needs to add
Vlink direct routes with physical interfaces of VLAN users to the routing table of a dynamic
routing protocol before advertising the routes to remote ends.
Advantages
With Vlink direct route advertisement, a device can add Vlink direct routes to the routing
table of a dynamic routing protocol (such as an Interior Gateway Protocol or BGP) and then
use different export policies to advertise routes required by remote ends.
1.10.2.3 Applications
1.10.2.3.1 Typical Application of IP FRR
In Figure 1-722, CE1 is dual-homed to PE1 and PE2. CE1 is configured with two outbound
interfaces and two next hops. Link B functions as the backup of link A. If link A fails, traffic
can be rapidly switched to link B.
1.10.2.3.2 Data Center Applications of Association Between Direct Routes and a VRRP
Backup Group
Service Overview
A data center, used for service access and transmission, consists of many servers, disk arrays,
security devices, and network devices that store and process a great number of services and
applications. Firewalls are used to improve data security, and VRRP backup groups are
configured to improve communication reliability. VRRP may cause user-to-network traffic
and network-to-user traffic to travel along different paths, and as a result, the firewall may
discard the network-to-user traffic because of path inconsistency. To address this problem,
association between direct routes and a VRRP backup group must be configured.
Networking Description
Figure 1-723 shows a data center network. A server functions as a core service module in the
data center. A VRRP backup group protects data exchanged between the server and core
devices, improving service security. Firewalls are attached to devices in the VRRP backup
group to improve network security.
Feature Deployment
The master device transmits server traffic to a core device. When the core device attempts to
send traffic to the server, the traffic can only pass through a firewall attached to the master
device. On the network shown in Figure 1-723, the server sends data destined for the core
device through the master device, and the core device sends data destined for the server along
a path that an Interior Gateway Protocol (IGP) selects. The association between the direct
routes and a VRRP backup group can be configured on switch A and switch B so that the IGP
selects a route based on VRRP status. The IGP forwards core-device-to-server traffic over the
same path as the one over which server-to-core-device traffic is transmitted, which prevents
the firewall from discarding traffic.
Service Overview
NodeBs and radio network controllers (RNCs) on an IP radio access network (RAN) do not
have dynamic routing capabilities. Therefore, static routes must be configured to allow
NodeBs to communicate with aggregation site gateways (ASGs) and allow RNCs to
communicate with remote service gateways (RSGs) that are at the aggregation layer. VRRP is
configured to provide ASG and RSG redundancy, improving device reliability and ensuring
non-stop transmission of value-added services, such as voice, video, and cloud computation
services over mobile bearer networks.
Networking Description
Figure 1-724 shows VRRP-based gateway protection applications on an IPRAN. A NodeB is
dual-homed to VRRP-enabled ASGs to communicate with the aggregation network. The
NodeB sends traffic destined for the RNC through the master ASG, while the RNC sends
traffic destined for the NodeB through either the master or backup ASG over a path selected
by a dynamic routing protocol. As a result, traffic in opposite directions may travel along
different paths. Similarly, the RNC is dual-homed to VRRP-enabled RSGs. Path inconsistency
may also occur.
Feature Deployment
On the IPRAN shown in Figure 1-724, both ASGs and RSGs may send and receive traffic
over different paths. For example, user-to-network traffic enters the aggregation network
through the master ASG, while network-to-user traffic flows out of the aggregation network
from the backup ASG. Path inconsistency complicates traffic monitoring or statistics
collection and increases the cost. In addition, when the master ASG is working properly, the
backup ASG also transmits services, which is counterproductive to VRRP redundancy backup
implementation. Association between direct routes and the VRRP backup group can be
configured to ensure path consistency.
On the NodeB side, the direct network segment routes of ASG VRRP interfaces can be
associated with VRRP status. The route with the master ASG as the next hop has a lower cost
than the route with the backup ASG as the next hop. The dynamic routing protocol imports
the direct routes and selects the route with a lower cost, ensuring path consistency.
Implementation on the RNC side is similar to that on the NodeB side.
DHCP 67 -
DNS 53 53
FTP - 20/21
HTTP - 80
IMAP - 993
NetBIOS 137/138 137/139
POP3 - 995
SMB 445 445
SMTP 25 25
SNMP 161 -
TELNET - 23
TFTP 69 -
Note that "-" indicates that the related transport layer protocol is not used.
Terms
Term Description
ARP Vlink direct IP packets are forwarded through a specified physical interface. IP
routes packets cannot be forwarded through a VLANIF interface, because a
VLANIF interface is a logical interface with several physical interfaces
Term Description
as its member interfaces. If an IPv4 packet reaches a VLANIF
interface, the device obtains information about the physical interface
using ARP and generates the relevant routing entry. The route recorded
in the routing entry is called an ARP Vlink direct route.
FRR FRR is applicable to services that are very sensitive to packet loss and
delay. When a fault is detected at the lower layer, the lower layer
informs the upper layer routing system of the fault. Then, the routing
system forwards packets through a backup link. In this manner, the
impact of the link fault on services is minimized.
NDP Vlink direct IP packets are forwarded through a specified physical interface. IP
routes packets cannot be forwarded through a VLANIF interface, because a
VLANIF interface is a logical interface with several physical interfaces
as its member interfaces. If an IPv6 packet reaches a VLANIF
interface, the device obtains information about the physical interface
using the neighbor discovery protocol (NDP) and generates the
relevant routing entry. The route recorded in the routing entry is called
an NDP Vlink direct route.
UNR When a user goes online through a Layer 2 device, such as a switch,
but there is no available Layer 3 interface and the user is assigned an
IP address, no dynamic routing protocol can be used. To enable
devices to use IP routes to forward the traffic of this user, use the
Huawei User Network Route (UNR) technology to assign a route to
forward the traffic of the user.
Abbreviations
Abbreviation Full Name
Definition
Static routes are special routes that are configured by network administrators.
Purpose
On a simple network, only static routes can ensure that the network runs properly. If a router
cannot run dynamic routing protocols or cannot generate routes to a destination network, you
can configure static routes on the router.
Route selection can be controlled using static routes. Properly configuring and using static
routes can improve network performance and guarantee the required bandwidth for important
applications. When a network fault occurs or the network topology changes, however, static
routes must be changed manually by the administrator.
1.10.3.2 Principles
1.10.3.2.1 Components
On the NE20E, you can run the ip route-static command to configure a static route, which
consists of the following components:
Destination address and mask
Outbound interface and next hop address
based on the longest match rule. The device can find the associated link layer address to
forward the packet only when the next hop address of the packet is available.
When specifying an outbound interface, note the following:
For a Point-to-Point (P2P) interface, if the outbound interface is specified, the next hop
address is the address of the remote interface connected to the outbound interface. For
example, when a GE interface is encapsulated with Point-to-Point Protocol (PPP) and
obtains the remote IP address through PPP negotiation, you need to specify only the
outbound interface rather than the next hop address.
Non-Broadcast Multiple-Access (NBMA) interfaces are applicable to
Point-to-Multipoint networks. Therefore, IP routes and the mappings between IP
addresses and link layer addresses are required. In this case, you need to configure next
hop addresses.
An Ethernet interface is a broadcast interface and a virtual-template (VT) interface can
be associated with multiple virtual access (VA) interfaces. If the Ethernet interface or the
VT interface is specified as the outbound interface of a static route, the next hop cannot
be determined because multiple next hops exist. Therefore, do not specify an Ethernet
interface or a VT interface as the outbound interface unless necessary. If you need to
specify a broadcast interface (such as an Ethernet interface) or a VT interface as the
outbound interface, specify the associated next hop address.
1.10.3.2.2 Applications
In Figure 1-725, the network topology is simple, and network communication can be
implemented through static routes. You need to specify an address for each physical network,
identify indirectly connected physical networks for each router, and configure static routes for
indirectly connected physical networks.
In Figure 1-725, static routes to networks 3, 4, and 5 need to be configured on Device A; static
routes to networks 1 and 5 need to be configured on Device B; static routes to networks 1, 2,
and 3 need to be configured on Device C.
address of the packet does not match any entry in the routing table, the packet is discarded. An
Internet Control Message Protocol (ICMP) packet is then sent, informing the originating host
that the destination host or network is unreachable.
The static route with the destination address and mask 0s (0.0.0.0 0.0.0.0) configured using
the ip route-static command is a default route intended to simplify network configuration.
In Figure 1-725, because the next hop of the packets from Device A to networks 3, 4, and 5 is
Device B, a default route can be configured on Device A to replace the three static routes
destined for networks 3, 4, and 5. Similarly, only a default route from Device C to Device B
needs to be configured to replace the three static routes destined for networks 1, 2, and 3.
1.10.3.2.3 Functions
BFD session so that the BFD session can detect the status of the link that the static route
passes through.
After BFD for static routes is configured, each static route can be associated with a BFD
session. In addition to route selection rules, whether a static route can be selected as the
optimal route is subject to BFD session status.
If a BFD session associated with a static route detects a link failure when the BFD
session is Down, the BFD session reports the link failure to the system. The system then
deletes the static route from the IP routing table.
If a BFD session associated with a static route detects that a faulty link recovers when
the BFD session is Up, the BFD session reports the fault recovery to the system. The
system then adds the static route to the IP routing table again.
By default, a static route can still be selected even though the BFD session associated
with it is AdminDown (triggered by the shutdown command run either locally or
remotely). If a device is restarted, the BFD session needs to be re-negotiated. In this case,
whether the static route associated with the BFD session can be selected as the optimal
route is subject to the re-negotiated BFD session status.
BFD for static routes has two detection modes:
Single-hop detection
In single-hop detection mode, the configured outbound interface and next hop address
are the information about the directly connected next hop. The outbound interface
associated with the BFD session is the outbound interface of the static route, and the peer
address is the next hop address of the static route.
Multi-hop detection
In multi-hop detection mode, only the next hop address is configured. Therefore, the
static route must be iterated to the directly connected next hop and outbound interface.
The peer address of the BFD session is the original next hop address of the static route,
and the outbound interface is not specified. In most cases, the original next hop to be
iterated is an indirect next hop. Multi-hop detection is performed on the static routes that
support route iteration.
For details about BFD, see the HUAWEI NE20E-S2 Universal Service Router Feature Description -
Reliability.
Background
Static routes do not have a dedicated detection mechanism. If a link fails, a network
administrator must manually delete the corresponding static route from the IP routing table.
This process delays link switchovers and can cause lengthy service interruptions.
Bidirectional Forwarding Detection (BFD) for static routes can use BFD sessions to monitor
the link status of a static route. However, both ends of the link must support BFD. Network
quality analysis (NQA) for static routes, however, can monitor the link status of a static route
as long as only one end supports NQA.
Table 1-185 compares BFD and NQA for static routes.
Table 1-185 Comparison between BFD and NQA for static routes
Related Concepts
NQA monitors network quality of service (QoS) in real time. If a network fails, NQA can be
used to diagnose the fault.
NQA relies on a test instance to monitor link status. The two ends of an NQA test are called
the NQA client and the NQA server. The NQA client initiates an NQA test that can return any
of the following results:
success: The test is successful. NQA instructs the routing management module to set the
static route to active and add the static route to the routing table.
failed: The test fails. NQA instructs the routing management module to set the static
route to inactive and delete the static route from the routing table.
no result: The test is running and no result has been obtained, which does not change the
status of the static route.
For NQA details, see the chapter "System Monitor" in the HUAWEI NE20E-S2 Universal Service
Router Feature Description.
Implementation
NQA for static routes associates an NQA test instance with a static route and uses the NQA
test instance to monitor the link. The routing management module determines whether a static
route is active or inactive based on the test result. If the static route is inactive, the routing
management module deletes it from the IP routing table and selects a backup link for data
forwarding, which prevents lengthy service interruptions.
In Figure 1-728, each access switch is connected to 10 clients, and a total of 100 clients are
connected. Because no dynamic routing protocol can be deployed between Device B and the
clients, static routes to the clients must be configured on Device B, and backup static routes to
the clients can be configured on Device C.
Device A, Device B, and Device C run a dynamic routing protocol and learn routes from one
another. Device B and Device C are configured to import static routes to the routing table of
the dynamic routing protocol, and different costs are set for the static routes. Device A can
contact Device B and Device C using the dynamic routing protocol to learn routes to the
clients. Device A selects one primary link and one backup link based on link costs.
NQA for static routes, configured on Device B, uses NQA test instances to monitor the status
of the primary link. If the primary link fails, the corresponding static route is deleted and
network-to-client traffic switches to the backup link. When both the primary and backup links
are running properly, network-to-client traffic is preferentially transmitted along the primary
link.
NQA test instances support both IPv4 and IPv6 static routes. The mechanisms for monitoring IPv4 and
IPv6 static routes are the same.
Each static route can be associated with only one NQA test instance.
Usage Scenario
NQA for static routes applies to a network on which BFD for static routes cannot be deployed
due to device connectivity limitations. For example, switches, optical line terminals (OLTs),
digital subscriber line access multiplexers (DSLAMs), multiservice access nodes (MSANs),
or x digital subscriber lines (xDSLs) exist on the network.
Benefits
NQA for static routes can monitor the link status of static routes and implement rapid
primary/backup link switchovers, preventing lengthy service interruptions.
Background
When the link over which a static route runs fails, the static route will be deleted from the IP
routing table to trigger a route re-selection. After a new route is selected, traffic is switched to
the new route. Some carriers, however, may require that specific traffic always travel along a
fixed link, regardless of the link status. Static route permanent advertisement is introduced to
meet this service need.
Implementation
With static route permanent advertisement, a static route can still be advertised and added to
the IP routing table for route selection even when the link over which the static route runs
fails. After static route permanent advertisement is configured, the static route can be
advertised and added to the IP routing table in both of the following scenarios:
An outbound interface is configured for the static route, and the outbound interface has
an IP address. Static route permanent advertisement is not affected no matter whether the
outbound interface is Up.
No outbound interface is configured for the static route. Static route permanent
advertisement is not affected no matter whether the static route can obtain an outbound
interface through route iteration.
After static route permanent advertisement is enabled, a static route always remains in the IP routing
table regardless of route reachability. If the destination of the route becomes unreachable, traffic
interruption occurs.
Typical Networking
On the network shown in Figure 1-729, BR1, BR2, and BR3 belong to ISP1, ISP2, and ISP3
respectively. Two links (Link A and Link B) exist between BR1 and BR2, but ISP1 expects its
service traffic destined for ISP2 to be always transmitted over Link A.
A direct EBGP peer relationship is established between BR1 and BR2. A static route is created
on BR1, with 10.1.1.2/24 (IP address of BR2) as the destination address and the local
interface connected to BR2 as the outbound interface.
Without static route permanent advertisement, Link A is used to transmit traffic. If Link A
fails, BGP will switch the traffic to Link B.
With static route permanent advertisement, Link A is used to transmit traffic regardless of
whether the destination is reachable through Link A. If Link A fails, no link switchover is
performed, causing traffic interruption. To check whether the destination is reachable through
the static route, ping the destination address of the static route to which static route permanent
advertisement is applied.
1.10.4 RIP
1.10.4.1 Introduction
Definition
Routing Information Protocol (RIP) is a simple Interior Gateway Protocol (IGP). RIP is used
in small-scale networks, such as campus networks and simple regional networks.
As a distance-vector routing protocol, RIP exchanges routing information through User
Datagram Protocol (UDP) packets with port number 520.
RIP employs the hop count as the metric to measure the distance to the destination. In RIP, by
default, the number of hops from the router to its directly connected network is 0; the number
of hops from the router to a network that is reachable through another router is 1, and so on.
The hop count (the metric) equals the number of routers along the path from the local network
to the destination network. To speed up route convergence, RIP defines the hop count as an
integer that ranges from 0 to 15. A hop count that is equal to or greater than 16 is classified as
infinite, indicating that the destination network or host is unreachable. Due to the hop limit,
RIP is not applicable to large-scale networks.
RIP has two versions:
Purpose
As the earliest IGP, RIP is used in small and medium-sized networks. Its implementation is
simple, and the configuration and maintenance of RIP are easier than those of Open Shortest
Path First (OSPF) and Intermediate System-to-Intermediate System (IS-IS). Therefore, RIP is
widely used on live networks.
1.10.4.2 Principles
RIP is a distance-vector routing protocol. It forwards packets through UDP and uses timers to
control the advertisement, update, and aging of routing information. However, design defects
in RIP may cause routing loops. Therefore, split horizon, poison reverse, and triggered update
were introduced into RIP to avoid routing loops.
In addition, RIP periodically advertises its routing table to neighbors, and route
summarization was introduced to reduce the size of the routing table.
1.10.4.2.1 RIP-1
RIP version 1 (RIP-1) is a classful routing protocol, which supports only the broadcast of
protocol packets. Figure 1-730 shows the format of a RIP-1 packet. A RIP packet can carry a
maximum of 25 routing entries. RIP is based on UDP, and a RIP-1 packet cannot be longer
than 512 bytes. RIP-1 packets do not carry any mask information, and RIP-1 can identify only
the routes to natural network segments, such as Class A, Class B, and Class C. Therefore,
RIP-1 does not support route summarization or discontinuous subnets.
1.10.4.2.2 RIP-2
RIP version 2 (RIP-2) is a classless routing protocol. Figure 1-731 shows the format of a
RIP-2 packet.
1.10.4.2.3 Timers
RIP uses the following timers:
Update timer: The Update timer periodically triggers Update packet transmission. By
default, the interval at which Update packets are sent is 30s.
Age timer: If a RIP device does not receive any packets from its neighbor to update a
route before the route expires, the RIP device considers the route unreachable. By default,
the age timer interval is 180s.
Garbage-collect timer: If a route becomes invalid after the age timer expires or a route
unreachable message is received, the route is placed into a garbage queue instead of
being immediately deleted from the RIP routing table. The garbage-collect timer
monitors the garbage queue and deletes expired routes. If an Update packet of a route is
received before the garbage-collect timer expires, the route is placed back into the age
queue. The garbage-collect timer is set to avoid route flapping. By default, the garbage
collect timer interval is 120s.
Hold-down timer: If a RIP device receives an updated route with cost 16 from a neighbor,
the route enters the holddown state, and the hold-down timer is started. To avoid route
flapping, the RIP device does not accept any updated routes until the hold-down timer
expires, even if the cost is less than 16 except in the following scenarios:
a. The cost carried in the Update packet is less than or equal to that carried in the last
update packet.
b. The hold-down timer expires, and the corresponding route enters the Garbage state.
The relationship between RIP routes and the four timers is as follows:
The advertisement of RIP routing updates is triggered by the update timer with a default
value 30 seconds.
Each routing entry is associated with two timers: the age timer and garbage-collect timer.
a. Each time a route is learned and added to the routing table, the age timer is started.
b. If no Update packet is received from the neighbor within 180 seconds after the age
timer is started, the metric of the corresponding route is set to 16, and the
garbage-collect timer is started.
If no Update packet is received within 120 seconds after the garbage-collect timer is
started, the corresponding routing entry is deleted from the routing table after the
garbage-collect timer expires.
By default, the hold-down timer is disabled. If you configure a hold-down timer, it starts
after the system receives a route with a cost greater than 16 from its neighbor.
In Figure 1-732, Device A sends Device B a route to 10.0.0.0/8. If split horizon is not
configured, Device B will send this route back to Device A after learning it from Device A. As
a result, Device A learns the following routes to 10.0.0.0/8:
A direct route with zero hops
A route with Device B as the next hop and total two hops
Only direct routes, however, are active in the RIP routing table of Device A.
If the route from Device A to 10.0.0.0/8 becomes unreachable and Device B is not notified,
Device B still considers the route to 10.0.0.0/8 reachable and continues sending this route to
Device A. Then, Device A receives incorrect routing information and considers the route to
10.0.0.0/8 reachable through Device B; Device B considers the route to 10.0.0.0/8 reachable
through Device A. As a result, a loop occurs on the network.
After split horizon is configured, Router B no longer sends the route back after learning the
route, which prevents such a loop.
learned from, and the interface will not send the routes back to the neighbor it learned them
from.
In Figure 1-733, Device A sends the route to 10.0.0.0/8 that it learns from Device B only to
Device C.
In Figure 1-734, Device A sends Device B a route to 10.0.0.0/8. If poison reverse is not
configured, Device B will send this route back to Device A after learning it from Device A. As
a result, Device A learns the following routes to 10.0.0.0/8:
A direct route with zero hops
A route with Device B as the next hop and total two hops
Only direct routes, however, are active in the RIP routing table of Device A.
If the route from Device A to 10.0.0.0 becomes unreachable and Device B is not notified,
Device B still considers the route to 10.0.0.0/8 reachable and continues sending this route to
Device A. Then, Device A receives incorrect routing information and considers the route to
10.0.0.0/8 reachable through Device B; Device B considers the route to 10.0.0.0/8 reachable
through Device A. As a result, a loop occurs on the network.
With poison reverse, after Device B receives the route from Device A, Device B sends a route
unreachable message to Device A with cost 16. Device A then no longer learns the reachable
route from Device B, which prevents routing loops.
If both split horizon and poison reverse are configured, only poison reverse takes effect.
In Figure 1-735, if the route to 11.4.0.0 becomes unreachable, Device C learns the
information first. By default, a RIP-enabled device sends routing updates to its neighbors
every 30s. If Device C receives an Update packet from Device B within 30s while Device C is
still waiting to send Update packets, Device C learns the incorrect route to 11.4.0.0. In this
case, the next hops of the routes from Device B or Device C to network 11.4.0.0 are Device C
and Device B respectively, which results in routing loops. If Device C sends an Update packet
to Device B immediately after it detects a network, Device B can rapidly update its routing
table, which prevents routing loops.
In addition, if the next hop of a route becomes unavailable due to a link failure, the local
Device sets the cost of the route to 16 and then advertises the route immediately to its
neighbors. This process is called route poisoning.
In RIP-2, route summarization can reduce the size of the routing table and improve the
extensibility and efficiency of a large-scale network.
Route summarization has two modes:
Process-based classful summarization
Summarized routes are advertised with natural masks. If split horizon or poison reverse
is configured, classful summarization becomes invalid because split horizon or poison
reverse suppresses some routes from being advertised. In addition, when classful
summarization is configured, routes learned from different interfaces may be
summarized into a single route. As a result, a conflict occurs in the advertisement of the
summarized route.
For example, a RIP process summarizes the route 10.1.1.0 /24 with metric 2 and route
10.2.2.0/24 with metric 3 into the route 10.0.0.0/8 with metric 2.
Interface-based summarization
Users can specify a summary address.
For example, users can configure a RIP-enabled interface to summarize the route
10.1.1.0/24 with metric 2 and route 10.2.2.0/24 with metric 3 into the route 10.1.0.0/16
with metric 2.
Background
Routing Information Protocol (RIP)-capable devices monitor the neighbor status by
exchanging Update packets periodically. During the period local devices detect link failures,
carriers or users may lose a large number of packets. Bidirectional forwarding detection (BFD)
for RIP can speed up fault detection and route convergence, which improves network
reliability.
After BFD for RIP is configured on the router, BFD can detect a fault (if any) within
milliseconds and notify the RIP module of the fault. The router then deletes the route that
passes through the faulty link and switches traffic to a backup link. This process speeds up
RIP convergence.
Table 1-186 describes the differences before and after BFD for RIP is configured.
Table 1-186 Differences before and after BFD for RIP is configured
Related Concepts
The BFD mechanism bidirectionally monitors data protocol connectivity over the link
between two routers. After BFD is associated with a routing protocol, BFD can rapidly detect
a fault (if any) and notify the protocol module of the fault, which speeds up route convergence
and minimizes traffic loss.
BFD is classified into the following modes:
Static BFD
In static BFD mode, BFD session parameters (including local and remote discriminators)
must be configured, and requests must be delivered manually to establish BFD sessions.
Static BFD is applicable to networks on which only a few links require high reliability.
Dynamic BFD
In dynamic BFD mode, the establishment of BFD sessions is triggered by routing
protocols, and the local discriminator is dynamically allocated, while the remote
discriminator is obtained from BFD packets sent by the neighbor.
When a new neighbor relationship is set up, a BFD session is established based on the
neighbor and detection parameters, including source and destination IP addresses. When
a fault occurs on the link, the routing protocol associated with BFD can detect the BFD
session Down event. Traffic is switched to the backup link immediately, which
minimizes data loss.
Dynamic BFD is applicable to networks that require high reliability.
Implementation
For details about BFD implementation, see "BFD" in Universal Service Router Feature
Description - Reliability. Figure 1-736 shows a typical network topology for BFD for RIP.
Dynamic BFD for RIP implementation:
a. RIP neighbor relationships are established among Device A, Device B, and Device
C and between Device B and Device D.
b. BFD for RIP is enabled on Device A and Device B.
c. Device A calculates routes, and the next hop along the route from Device A to
Device D is Device B.
d. If a fault occurs on the link between Device A and Device B, BFD will rapidly
detect the fault and report it to Device A. Device A then deletes the route whose
next hop is Device B from the routing table.
e. Device A recalculates routes and selects a new path Device C → Device B →
Device D.
f. After the link between Device A and Device B recovers, a new BFD session is
established between the two routers. Device A then reselects an optimal link to
forward packets.
Static BFD for RIP implementation:
a. RIP neighbor relationships are established among Device A, Device B, and Device
C and between Device B and Device D.
b. Static BFD is configured on the interface that connects Device A to Device B.
c. If a fault occurs on the link between Device A and Device B, BFD will rapidly
detect the fault and report it to Device A. Device A then deletes the route whose
next hop is Device B from the routing table.
d. After the link between Device A and Device B recovers, a new BFD session is
established between the two routers. Device A then reselects an optimal link to
forward packets.
Usage Scenario
BFD for RIP is applicable to networks that require high reliability.
Benefits
BFD for RIP improves network reliability and enables devices to rapidly detect link faults,
which speeds up route convergence on RIP networks.
Simple authentication: The authenticated party adds the configured password directly to
packets for authentication. This authentication mode provides the lowest password
security.
MD5 authentication: The authenticated party uses the Message Digest 5 (MD5)
algorithm to generate a ciphertext password and adds it to packets for authentication.
This authentication mode improves password security.
Keychain authentication: The authenticated party configures a keychain that changes
over time. This authentication mode further improves password security.
Keychain authentication improves RIP security by periodically changing the password
and the encryption algorithms. For details about Keychain, see "Keychain" in NE20E
Feature Description - Security.
HMAC-SHA256 authentication: The authenticated party uses the HMAC-SHA256
algorithm to generate a ciphertext password and adds it to packets for authentication.
RIP authentication ensures network security by adding an authentication field used to encrypt
a packet before sending the packet to ensure network security. After receiving a RIP packet
from a remote router, the local router discards the packet if the authentication password in the
packet does not match the local authentication password. This authentication mode protects
the local router.
On IP networks of carriers, RIP authentication ensures the secure transmission of packets,
improves the system security, and provides secure network services for carriers.
1.10.5 RIPng
1.10.5.1 Introduction
Definition
RIP next generation (RIPng) is an extension to RIP version 2 (RIP-2) on IPv6 networks. Most
RIP concepts apply to RIPng.
RIPng is a distance-vector routing protocol, which measures the distance (metric or cost) to
the destination host by the hop count. In RIPng, the hop count from a device to its directly
connected network is 0, and the hop count from a device to a network that is reachable
through another device is 1. When the hop count is equal to or exceeds 16, the destination
network or host is defined as unreachable.
To be applied on IPv6 networks, RIPng makes the following changes to RIP:
UDP port number: RIPng uses UDP port number 521 to send and receive routing
information.
Multicast address: RIPng uses FF02::9 as the link-local multicast address of a RIPng
device.
Prefix length: RIPng uses a 128-bit (the mask length) prefix in the destination address.
Next hop address: RIPng uses a 128-bit IPv6 address.
Source address: RIPng uses link-local address FE80::/10 as the source address to send
RIPng Update packets.
Purpose
RIPng is an extension to RIP for support of IPv6.
1.10.5.2 Principles
RIPng is an extension to RIPv2 on IPv6 networks and uses the same timers as RIPv2. RIPng
supports split horizon, poison reverse, and triggered update, which prevents routing loops.
1.10.5.2.2 Timers
RIPng uses the following timers:
Update timer: This timer periodically triggers Update packet transmission. By default,
the interval at which Update packets are sent is 30s. This timer is used to synchronize
RIPng routes on the network.
Age timer: If a RIPng device does not receive any Update packet from its neighbor
before a route expires, the RIPng device considers the route to its neighbor unreachable.
Garbage-collect timer: If no packet is received to update an unreachable route after the
Age timer expires, this route is deleted from the RIPng routing table.
Hold-down timer: If a RIP device receives an updated route with cost 16 from a neighbor,
the route enters the holddown state, and the hold-down timer is started.
The following describes the relationship among these timers:
The advertisement of RIPng routing updates is periodically triggered by the update timer with
default value 30 seconds. Each routing entry is associated with two timers: the Age timer and
garbage-collect timer. Each time a route is learned and added to the routing table, the Age
timer is started. If no Update packet is received from the neighbor within 180 seconds, the
metric of the route is set to 16, and the garbage-collect timer is started. If no Update packet is
received within 120 seconds, the route is deleted after the garbage-collect timer expires.
By default, hold-down timer is disabled. If you configure a hold-down timer, it starts after the
system receives a route with a cost greater than 16 from its neighbor.
On the network shown in Figure 1-740, after Device B sends a route to network 123::45 to
Device A, Device A does not send the route back to Device B.
In Figure 1-741, if poison reverse is not configured, Device B sends Device A a route learned
from Device A. The cost of the route from Device A to network 123::0/64 is 1. If the route
from Device A to network 123::0/64 becomes unreachable and Device B does not receive an
Update packet from Device A and keeps sending Device A the route from Device A to
network 123::0/64, a routing loop occurs.
With poison reverse, after Device B receives the route from Device A, Device B sends a route
unreachable message to Device A with cost 16. Device A then no longer learns the reachable
route from Device B, which prevents routing loops.
If both poison reverse and split horizon are configured, only poison reverse takes effect.
In Figure 1-742, if network 123::0 is unreachable, Device C learns the information first. By
default, a RIPng-enabled device sends Update packets to its neighbors every 30 seconds. If
Device C receives an Update packet from Device B within 30s when Device C is still waiting
to send Update packets, Device C learns the incorrect route to 123::0. In this case, the next
hops of the routes from Device B and Device C to 123::0 are Device C andDevice B,
respectively, which results in routing loops. If Device C sends an Update packet to Device B
immediately after it detects a network fault, Device B can rapidly update its routing table,
which prevents routing loops.
In addition, if the next hop of a route becomes unavailable due to a link failure, the local
Router sets the cost of the route to 16 and then advertises the route immediately to its
neighbors. This process is called route poisoning.
Background
On large networks, the RIPng routing table of each device contains a large number of routes,
which consumes lots of system resources. In addition, if a specific link connected to a device
within an IP address range frequently alternates between Up and Down, route flapping occurs.
To address these problems, RIPng route summarization was introduced. With RIPng route
summarization, a device summarizes routes destined for different subnets of a network
segment into one route destined for one network segment and then advertises the summarized
route to other network segments. RIPng route summarization reduces the number of routes in
the routing table, minimizes system resource consumption, and prevents route flapping.
Implementation
RIPng route summarization is interface-based. After RIPng route summarization is enabled on
an interface, the interface summarizes routes based on the longest matching rule and then
advertises the summarized route. The smallest metric among the specific routes for the
summarization is used as the metric of the summarized route.
For example, an interface has two routes: 11:11:11::24 with metric 2 and 11:11:12::34 with
metric 3. After RIPng route summarization is enabled on the interface, the interface
summarizes the two routes into the route 11::0/16 with metric 2 and then advertises it.
Background
As networks develop, network security has become an increasing concern. Internet Protocol
Security (IPsec) authentication can be used to authenticate RIPng packets. The packets that
fail to be authenticated are discarded, which prevents data transmitted based on TCP/IP from
being intercepted, tampered with, or attacked.
Implementation
IPsec has an open standard architecture and ensures secure packet transmission on the Internet
by encrypting packets. RIPng IPsec provides a complete set of security protection
mechanisms to authenticate RIPng packets, which prevents devices from being attacked by
forged RIPng packets.
IPsec includes a set of protocols that are used at the network layer to ensure data security,
such as Internet Key Exchange (IKE), Authentication Header (AH), and Encapsulating
Security Payload (ESP). The three protocols are described as follows:
AH: A protocol that provides data origin authentication, data integrity check, and
anti-replay protection. AH does not encrypt packets to be protected.
ESP: A protocol that provides IP packet encryption and authentication mechanisms
besides the functions provided by AH. The encryption and authentication mechanisms
can be used together or independently.
Benefits
RIPng IPsec offers the following benefits:
1.10.6 OSPF
1.10.6.1 Introduction
Definition
Open Shortest Path First (OSPF) is a link-state Interior Gateway Protocol (IGP) developed by
the Internet Engineering Task Force (IETF).
OSPF version 2 (OSPFv2) is intended for IPv4. OSPF version 3 (OSPFv3) is intended for
IPv6.
Purpose
Before the emergence of OSPF, the Routing Information Protocol (RIP) was widely used as
an IGP on networks. RIP is a distance-vector routing protocol. Due to its slow convergence,
routing loops, and poor scalability, RIP is gradually being replaced with OSPF.
Typical IGPs include RIP, OSPF, and Intermediate System to Intermediate System (IS-IS).
Table 1-187 describes differences among the three typical IGPs.
Benefits
OSPF offers the following benefits:
Wide application scope: OSPF applies to medium-sized networks with several hundred
routers, such as enterprise networks.
Network masks: OSPF packets can carry masks, and therefore the packet length is not
limited by natural IP masks. OSPF can process variable length subnet masks (VLSMs).
Fast convergence: When the network topology changes, OSPF immediately sends link
state update (LSU) packets to synchronize the changes to the link state databases
(LSDBs) of all routers in the same autonomous system (AS).
Loop-free routing: OSPF uses the SPF algorithm to calculate loop-free routes based on
the collected link status.
Area partitioning: OSPF allows an AS to be partitioned into areas, which simplifies
management. Routing information transmitted between areas is summarized, which
reduces network bandwidth consumption.
Equal-cost routes: OSPF supports multiple equal-cost routes to the same destination.
Hierarchical routing: OSPF uses intra-area routes, inter-area routes, Type 1 external
routes, and Type 2 external routes, which are listed in descending order of priority.
Authentication: OSPF supports area-based and interface-based packet authentication,
which ensures packet exchange security.
Multicast: OSPF uses multicast addresses to send packets on certain types of links,
which minimizes the impact on other devices.
1.10.6.2 Principles
1.10.6.2.1 Basic Concepts
This section describes the basic Open Shortest Path First (OSPF) concepts.
Router ID
A router ID is a 32-bit unsigned integer, which identifies a router in an autonomous system
(AS). A router ID must exist before the router runs OSPF.
A router ID can be manually configured or automatically obtained.
If no router ID has been configured, the router automatically obtains a router ID using the
following methods in descending order of priority.
1. The router preferentially selects the largest IP address from its loopback interface
addresses as the router ID.
2. If no loopback interface has been configured, the router selects the largest IP address
from its interface IP addresses as the router ID.
A router can obtain a router ID again only after a router ID is reconfigured for the router or an
OSPF router ID is reconfigured and the OSPF process restarts.
Area
When a large number of routers run OSPF, link state databases (LSDBs) become very large
and require a large amount of storage space. Large LSDBs also complicate shortest path first
(SPF) computation and overload the routers. As the network grows, the network topology
changes, which results in route flapping and frequent OSPF packet transmission. When a
large number of OSPF packets are transmitted, bandwidth usage efficiency decreases, and
each router on a network has to recalculate routes in case of any topology change.
OSPF resolves this problem by partitioning an AS into different areas. An area is regarded as
a logical group, and each group is identified by an area ID. A router, not a link, resides at the
border of an area. A network segment or link can belong only to one area. An area must be
specified for each OSPF interface.
OSPF areas include common areas, stub areas, and not-so-stubby areas (NSSAs). Table 1-188
describes these OSPF areas.
Router Type
Routers are classified as internal routers, ABRs, backbone routers, or ASBRs by location in an
AS. Figure 1-743 shows the four router types.
LSA
OSPF encapsulates routing information into LSAs for transmission. Table 1-190 describes
LSAs and their functions.
Packet Type
OSPF packets are classified as Hello, Database Description (DD), Link State Request (LSR),
Link State Update (LSU), or Link State Acknowledgment (LSAck) packets. Table 1-191
describes OSPF packets and their functions.
Route Type
Route types are classified as intra-area, inter-area, Type 1 external, or Type 2 external routes.
Intra-area and inter-area routes describe the network structure of an AS. Type 1 or Type 2 AS
external routes describe how to select routes to destinations outside an AS.
Table 1-192 describes OSPF routes in descending order of priority.
Network Type
Networks are classified as broadcast, non-broadcast multiple access (NBMA),
point-to-multipoint (P2MP), or point-to-point (P2P) networks by link layer protocol. Table
1-193 describes the network types.
Broadcast Ethernet
FDDI
DR and BDR
On broadcast or NBMA networks, any two routers need to exchange routing information. As
shown in Figure 1-744, nrouters are deployed on the network. n x (n - 1)/2 adjacencies must
be established. Any route change on a router is transmitted to other routers, which wastes
bandwidth resources. OSPF resolves this problem by defining a DR and a backup designated
router (BDR). After a DR is elected, all routers send routing information only to the DR. Then
the DR broadcasts LSAs. routers other than the DR and BDR are called DR others. The DR
others establish only adjacencies with the DR and BDR and not with each other. This process
reduces the number of adjacencies established between routers on broadcast or NBMA
networks.
If the original DR fails, routers must reelect a DR and the routers except the new DR must
synchronize routing information to the new DR. This process is lengthy, which may cause
incorrect route calculations. A BDR is used to shorten the process. The BDR is a backup for a
DR. A BDR is elected together with a DR. The BDR establishes adjacencies with all routers
on the network segment and exchanges routing information with them. When the DR fails, the
BDR immediately becomes a new DR. The routers need to reelect a new BDR, but this
process does not affect route calculations.
The DR priority of a router interface determines its qualification for DR and BDR elections.
The router interfaces with their DR priorities greater than 0 are eligible. Each router adds the
elected DR to a Hello packet and sends it to other routers on the network segment. When both
router interfaces on the same network segment declare that they are DRs, the router interface
with a higher DR priority is elected as a DR. If the two router interfaces have the same DR
priority, the router interface with a larger router ID is elected as a DR.
OSPF Multi-Process
OSPF multi-process allows multiple OSPF processes to independently run on the same router.
Route exchange between different OSPF processes is similar to that between different routing
protocols. A router's interface can belong only to one OSPF process.
OSPF multi-process is typically used on virtual private networks (VPNs) on which OSPF is
deployed between provider edges (PEs) and customer edges (CEs). The OSPF processes on
the PEs are independent of each other.
A router in an area can advertise LSAs carrying a default route only when the router has
an interface connected to a device outside the area.
If a router has advertised LSAs carrying a default route, the router no longer learns the
same type of LSA advertised by other routers, which carry a default route. That is, the
router uses only the LSAs advertised by itself to calculate routes. The LSAs advertised
by other routers are still saved in the LSDB.
If a router must use a route to advertise LSAs carrying an external default route, the
route cannot be a route learned by the local OSPF process. A router in an area uses an
external default route to forward packets outside the area. If the next hops of routes in
the area are routers in the area, packets cannot be forwarded outside the area.
Before a router advertises a default route, it checks whether a neighbor in the full state is
present in area 0. The router advertises a default route only when a neighbor in the full
state is present in area 0. If no such a neighbor exists, the backbone area cannot forward
packets and advertising a default route is meaningless. For the concept of the Full State,
see OSPF Neighbor States.
Table 1-194 describes the principles for advertising default routes in different areas.
Down This is the initial state of a neighbor conversation. This state indicates that a
router has not received any Hello packets from its neighbors within a dead
interval.
Attempt In the Attempt state, a router periodically sends Hello packets to manually
configured neighbors.
NOTE
This state applies only to non-broadcast multiple access (NBMA) interfaces.
Init This state indicates that a router has received Hello packets from its neighbors
but the neighbors did not receive Hello packets from the router.
2-way This state indicates that a router has received Hello packets from its neighbors
and neighbor relationships have been established between the routers.
If no adjacency needs to be established, the neighbors remain in the 2-way
state. If adjacencies need to be established, the neighbors enter the Exstart
state.
Exstart In the Exstart state, routers establish a master/slave relationship to ensure that
DD packets are sequentially exchanged.
Exchange In the Exchange state, routers exchange DD packets. A router uses a DD
packet to describe its own LSDB and sends the packet to its neighbors.
Loading In the Loading state, a router sends Link State Request (LSR) packets to its
neighbors to request their LSAs for LSDB synchronization.
Full In the Full state, a router establishes adjacencies with its neighbors and all
LSDBs have been synchronized.
The neighbor state of the local router may be different from that of the remote router. For example, the
neighbor state of the local router is Full, but the neighbor state of the remote router is Loading.
Adjacency Establishment
Adjacencies can be established in either of the following situations:
Two routers have established a neighbor relationship and communicate for the first time.
The designated router (DR) or backup designated router (BDR) on a network segment
changes.
The adjacency establishment process is different on different networks.
Adjacency establishment on a broadcast network
On a broadcast network, the DR and BDR establish adjacencies with each router on the same
network segment, but DR others establish only neighbor relationships.
Figure 1-746 shows the adjacency establishment process on a broadcast network.
Seen field of 1.1.1.1 (Router A's router ID). Router A has been discovered but its
router ID is less than that of Router B, and therefore Router B regards itself as a DR.
Then Router B's state changes to Init.
c. After Router A receives the packet, Router A's state changes to 2-way.
The following procedures are not performed for DR others on a broadcast network.
2. Master/Slave negotiation and DD packet exchange
a. Router A sends a DD packet to Router B. The packet carries the following fields:
Seq field: The value x indicates the sequence number is x.
I field: The value 1 indicates that the packet is the first DD packet, which is
used to negotiate a master/slave relationship and does not carry LSA
summaries.
M field: The value 1 indicates that the packet is not the last DD packet.
MS field: The value 1 indicates that Router A declares itself a master.
To improve transmission efficiency, Router A and Router B determine which LSAs
in each other's LSDB need to be updated. If one party determines that an LSA of the
other party is already in its own LSDB, it does not send an LSR packet for updating
the LSA to the other party. To achieve the preceding purpose, Router A and Router
B first send DD packets, which carry summaries of LSAs in their own LSDBs.
Each summary identifies an LSA. To ensure packet transmission reliability, a
master/slave relationship must be determined during DD packet exchange. One
party serving as a master uses the Seq field to define a sequence number. The
master increases the sequence number by one each time it sends a DD packet. When
the other party serving as a slave sends a DD packet, it adds the sequence number
carried in the last DD packet received from the master to the Seq field of the packet.
b. After Router B receives the DD packet, Router B's state changes to Exstart and
Router B returns a DD packet to Router A. The returned packet does not carry LSA
summaries. Because Router B's router ID is greater than Router A's router ID,
Router B declares itself a master and sets the Seq field to y.
c. After Router A receives the DD packet, it agrees that Router B is a master and
Router A's state changes to Exchange. Then Router A sends a DD packet to Router
B to transmit LSA summaries. The packet carries the Seq field of y and the MS
field of 0. The value 0 indicates that Router A declares itself a slave.
d. After Router B receives the packet, Router B's state changes to Exchange and
Router B sends a new DD packet containing its own LSA summaries to Router A.
The value of the Seq field carried in the new DD packet is changed to y + 1.
Router A uses the same sequence number as Router B to confirm that it has received DD
packets from Router B. Router B uses the sequence number plus one to confirm that it
has received DD packets from Router A. When Router B sends the last DD packet, it sets
the M field of the packet to 0.
3. LSDB synchronization
a. After Router A receives the last DD packet, it finds that many LSAs in Router B's
LSDB do not exist in its own LSDB, so Router A's state changes to Loading. After
Router B receives the last DD packet from Router A, Router B's state directly
changes to Full, because Router B's LSDB already contains all LSAs of Router A.
b. Router A sends an LSR packet for updating LSAs to Router B. Router B returns an
LSU packet to Router A. After Router A receives the packet, it sends an LSAck
packet for acknowledgement.
The preceding procedures continue until the LSAs in Router A's LSDB are the same as
those in Router B's LSDB. Router A's state changes to Full. After Router A and Router B
exchange DD packets and update all LSAs, they establish an adjacency.
Adjacency establishment on an NBMA network
The adjacency establishment process on an NBMA network is similar to that on a broadcast
network. The blue part shown in Figure 1-747 highlights the differences from a broadcast
network.
On an NBMA network, all routers establish adjacencies only with the DR and BDR.
The following procedures are not performed for DR others on an NBMA network.
2. Master/Slave relationship negotiation and DD packet exchange
The procedures for negotiating a master/slave relationship and exchanging DD packets
on an NBMA network are the same as those on a broadcast network.
3. LSDB synchronization
The procedure for synchronizing LSDBs on an NBMA network is the same as that on a
broadcast network.
Adjacency establishment on a point-to-point (P2P)/Point-to-multipoint (P2MP) network
The adjacency establishment process on a P2P/P2MP network is similar to that on a broadcast
network. On a P2P/P2MP network, however, no DR or BDR needs to be elected and DD
packets are transmitted in multicast mode.
Route Calculation
OSPF uses an LSA to describe the network topology. A Type 1 LSA describes the attributes of
a link between routers. A router transforms its LSDB into a weighted, directed graph, which
reflects the topology of the entire AS. All routers in the same area have the same graph. Figure
1-748 shows a weighted, directed graph.
Based on the graph, each router uses the SPF algorithm to calculate an SPT with itself as the
root. The SPT shows routes to nodes in the AS. Figure 1-749 shows an SPT.
When a router's LSDB changes, the router recalculates a shortest path. Frequent SPF
calculations consume a large amount of resources and affect router efficiency. Changing the
interval between SPF calculations can prevent resource consumption caused by frequent
LSDB changes. The default interval between SPF calculations is 5 seconds.
The route calculation process is as follows:
1. A router calculates intra-area routes.
The router uses an SFP algorithm to calculate shortest paths to other routers in an area.
Type 1 and Type 2 LSAs accurately describe the network topology in an area. Based on
the network topology described by a Type 1 LSA, the router calculates paths to other
routers in the area.
If multiple equal-cost routes are produced during route calculation, the SPF algorithm retains all these
routes in the LSDB.
2. The router calculates inter-area routes.
For the routers in an area, the network segment of the routes in an adjacent area is
directly connected to the area border router (ABR). Because the shortest path to the ABR
has been calculated in the preceding step, the routers can directly check a Type 3 LSA to
obtain the shortest path to the network segment. The autonomous system boundary
router (ASBR) can also be considered connected to the ABR. Therefore, the shortest path
to the ASBR can also be calculated in this phase.
If the router performing an SPF calculation is an ABR, the router needs to check only Type 3 LSAs in
the backbone area.
3. The router calculates AS external routes.
AS external routes can be considered to be directly connected to the ASBR. Because the
shortest path to the ASBR has been calculated in the preceding phase, the router can
check Type 5 LSAs to obtain the shortest paths to other ASs.
Route Summarization
When a large OSPF network is deployed, an OSPF routing table includes a large number of
routing entries. To accelerate route lookup and simplify management, configure route
summarization to reduce the size of the OSPF routing table. If a link frequently alternates
between Up and Down, the links not involved in the route summarization are not affected.
This process prevents route flapping and improves network stability.
Route summarization can be carried out on an ABR or ASBR.
ABR summarization
When an ABR transmits routing information to other areas, it generates Type 3 LSAs for
each network segment. If consecutive network segments exist in this area, you can
summarize these network segments into a single network segment. The ABR generates
one LSA for the summarized network segment and advertises only that LSA.
ASBR summarization
If route summarization has been configured and the local router is an ASBR, the local
router summarizes imported Type 5 LSAs within the summarized address range. If an
NSSA has been configured, the local router also summarizes imported Type 7 LSAs
within the summarized address range.
If the local router is both an ASBR and an ABR, it summarizes Type 5 LSAs translated
from Type 7 LSAs.
Route Filtering
OSPF routing policies include access control lists (ACLs), IP prefix lists, and route-policies.
For details about these policies, see the section "Routing Policy" in the NE20EFeature
Description - IP Routing.
OSPF route filtering applies in the following aspects:
Route import
OSPF can import the routes learned by other routing protocols. A router uses a
configured routing policy to filter routes and imports only the routes matching the
routing policy. Only an ASBR can import routes, and therefore a routing policy for
importing routes must be configured on the ASBR.
Advertising of imported routes
A router advertises imported routes to its neighbors. Only an ASBR can import routes,
and therefore a routing policy for the advertising of imported routes must be configured
on the ASBR.
If OSPF imports a large number of external routes and advertises them to a device with a
smaller routing table capacity, the device may restart unexpectedly. To address this
problem, configure a limit on the number of LSAs generated when an OSPF process
imports external routes.
Route learning
A router uses a routing policy to filter received intra-area, inter-area, and AS external
routes. The router adds only the routes matching the routing policy to its routing table.
All routes can still be advertised from an OSPF routing table.
The router filters only routes calculated based on LSAs, and therefore learned LSAs are
complete.
Inter-area LSA learning
An ABR in an area can be configured to filter Type 3 LSAs advertised to the area. The
ABR can advertise only Type 3 LSAs, and therefore a routing policy for inter-area LSA
learning must be configured on the ABR.
During inter-area LSA learning, the ABR directly filters Type 3 LSAs advertised to the
area.
Inter-area LSA advertising
An ABR in an area can be configured to filter Type 3 LSAs advertised to other areas. The
ABR can advertise only Type 3 LSAs, and therefore a routing policy for inter-area LSA
advertising must be configured on the ABR.
The maximum number of external routes configured for all devices in the OSPF AS must be the same.
When the number of external routes in the LSDB reaches the maximum number, the device
enters the overload state and starts the overflow timer at the same time. The device
automatically exits from the overflow state after the overflow timer expires. Table 1-196
describes the operations performed by the device after it enters or exits from the overload
state.
Table 1-196 Operations performed by the device after it enters or exits from the overload state
Background
All non-backbone areas must be connected to the backbone area during OSPF deployment to
ensure that all areas are reachable.
In Figure 1-750, area 2 is not connected to area 0 (backbone area), and Device B is not an
ABR. Therefore, Device B does not generate routing information about network 1 in area 0,
and Device C does not have a route to network 1.
Some non-backbone areas may not be connected to the backbone area. You can configure an
OSPF virtual link to resolve this issue.
Related Concepts
A virtual link refers to a logical channel established between two ABRs over a non-backbone
area.
A virtual link must be configured at both ends of the link.
The non-backbone area involved is called a transit area.
A virtual link is similar to a point-to-point (P2P) connection established between two ABRs.
You can configure interface parameters, such as the interval at which Hello packets are sent,
at both ends of the virtual link as you do on physical interfaces.
Principles
In Figure 1-751, two ABRs use a virtual link to directly transmit OSPF packets. The device
between the two ABRs only forwards packets. Because the destination of OSPF packets is not
the device, the device transparently transmits the OSPF packets as common IP packets.
1.10.6.2.5 OSPF TE
OSPF Traffic Engineering (TE) is developed based on OSPF to support Multiprotocol Label
Switching (MPLS) TE and establish and maintain TE LSPs. In the MPLS TE architecture
described in "MPLS Feature Description", OSPF functions as the information advertising
component, responsible for collecting and advertising MPLS TE information.
In addition to the network topology, TE needs to know network constraints, such as the
bandwidth, TE metric, administrative group, and affinity attribute. However, current OSPF
functions cannot meet these requirements. Therefore, OSPF introduces a new type of LSAs to
advertise network constraints. Based on the network constraints, the Constraint Shortest Path
First (CSPF) algorithm can calculate the path subject to specified constraints.
TE-LSA
OSPF uses a new type of LSA (Type 10 opaque LSA) to collect and advertise TE information.
Type 10 opaque LSAs contain the link status information required by TE, including the
maximum link bandwidth, maximum reservable bandwidth, current reserved bandwidth, and
link color. Based on the OSPF flooding mechanism, Type 10 opaque LSAs synchronize link
status information among devices in an area to form a uniform TEDB for route calculation.
OSPF SRLG
OSPF supports the applications of the Shared Risk Link Group (SRLG) in MPLS by
obtaining information about the TE SRLG flooded among devices in an area. For details, refer
to the chapter "MPLS" in this manual.
Definition
As an extension of OSPF, OSPF VPN enables Provider Edges (PEs) and Customer Edges
(CEs) in VPNs to run OSPF for interworking and use OSPF to learn and advertise routes.
Purpose
As a widely used IGP, in most cases, OSPF runs in VPNs. If OSPF runs between PEs and CEs,
and PEs use OSPF to advertise VPN routes to CEs, no other routing protocols need to be
configured on CEs for interworking with PEs, which simplifies management and
configuration of CEs.
Figure 1-753 Networking with OSPF running between PEs and CEs
The routes that PE1 receives from CE1 are advertised to CE3 and CE4 as follows:
1. PE1 imports OSPF routes of CE1 into BGP and converts them to BGP VPNv4 routes.
2. PE1 uses MP-BGP to advertise the BGP VPNv4 routes to PE2.
3. PE2 imports the BGP VPNv4 routes into OSPF and then advertises these routes to CE3
and CE4.
The process of advertising routes of CE4 or CE3 to CE1 is the same as the preceding process.
A non-backbone area (Area 1) is configured between PE1 and CE1, and a backbone area
(Area 0) is configured in Site 1. The backbone area in Site 1 is separated from the VPN
backbone area. To ensure that the backbone areas are contiguous, a virtual link is configured
between PE1 and CE1.
OSPF Domain ID
If inter-area routes are advertised between local and remote OSPF areas, these areas are
considered to be in the same OSPF domain.
Domain IDs identify domains.
Each OSPF domain has one or more domain IDs. If more than one domain ID is
available, one of the domain IDs is a primary ID, and the others are secondary IDs.
If an OSPF instance does not have a specific domain ID, its ID is considered as null.
Before advertising the remote routes sent by BGP to CEs, PEs need to determine the type of
OSPF routes (Type 3, Type 5, or Type 7) to be advertised to CEs based on domain IDs.
If local domain IDs are the same as or compatible with remote domain IDs in BGP
routes, PEs advertise Type 3 routes.
If local domain IDs are different from or incompatible with remote domain IDs in BGP
routes, PEs advertise Type 5 or Type 7 routes.
In Figure 1-755, on PE1, OSPF imports a BGP route destined for 10.1.1.1/32 and then
generates and advertises a Type 5 or Type 7 LSA to CE1. Then, CE1 learns an OSPF route
with 10.1.1.1/32 as the destination address and PE1 as the next hop and advertises the route to
PE2. Therefore, PE2 learns an OSPF route with 10.1.1.1/32 as the destination address and
CE1 as the next hop.
Similarly, CE1 also learns an OSPF route with 10.1.1.1/32 as the destination address and PE2
as the next hop. PE1 learns an OSPF route with 10.1.1.1/32 as the destination address and
CE1 as the next hop.
As a result, CE1 has two equal-cost routes with PE1 and PE2 as next hops respectively, and
the next hop of the routes from PE1 and PE2 to 10.1.1.1/32 is CE1, which leads to a routing
loop.
In addition, the priority of an OSPF route is higher than that of a BGP route. Therefore, on
PE1 and PE2, BGP routes to 10.1.1.1/32 are replaced with the OSPF route, and the OSPF
route with 10.1.1.1/32 as the destination address and CE1 as the next hop is active in the
routing tables of PE1 and PE2.
The BGP route is inactive, and therefore, the LSA generated when this route is imported by
OSPF is deleted, which causes the OSPF route to be withdrawn. As a result, no OSPF route
exists in the routing table, and the BGP route becomes active again. This cycle causes route
flapping.
OSPF VPN provides a few solutions to routing loops, as described in Table 1-198.
Exercise caution when disabling routing loop prevention because it may cause routing loops.
During BGP or OSPF route exchanges, routing loop prevention prevents OSPF routing loops
in VPN sites.
In the inter-AS VPN Option A scenario, if OSPF runs between ASBRs to transmit VPN routes,
the remote ASBR may fail to learn the OSPF routes sent by the local ASBR due to the routing
loop prevention mechanism.
In Figure 1-756, inter-AS VPN Option A is deployed with OSPF running between PE1 and
CE1. CE1 sends VPN routes to CE2.
1. PE1 learns routes to CE1 using the OSPF process in a VPN instance, imports these
routes into MP-BGP, and sends the MP-BGP routes to ASBR1.
2. After receiving the MP-BGP routes, ASBR1 imports the routes into the OSPF process in
a VPN instance and generates Type 3, Type 5, or Type 7 LSAs carrying DN bit 1.
3. ASBR2 uses OSPF to learn these LSAs and checks the DN bit of each LSA. After
learning that the DN bit in each LSA is 1, ASBR2 does not add the routes carried in
these LSAs to its routing table.
The routing loop prevention mechanism prevents ASBR2 from learning the OSPF routes sent
from ASBR1. As a result, CE1 cannot communicate with CE3.
To address the preceding problem, use either of the following methods:
Disable the device from setting the DN bit to 1 in the LSAs when importing BGP routes
into OSPF. For example, if ASBR1 does not set the DN bit to 1 when importing
MP-BGP routes into OSPF. After ASBR2 receives these routes and finds that the DN bit
in the LSAs carrying these routes is 0, ASBR2 will add the routes to its routing table.
Disable the device from checking the DN bit after receiving LSAs. For example, ASBR1
sets the DN bit to 1 in LSAs when importing MP-BGP routes into OSPF. ASBR2,
however, does not check the DN bit after receiving these LSAs.
The preceding methods can be used based on specific types of LSAs. You can configure a
sender to determine whether to set the DN bit to 1 or configure a receiver to determine
whether to check the DN bit in the Type 3 LSAs based on the router ID of the device that
generates the Type 3 LSAs.
In the inter-AS VPN Option A scenario shown in Figure 1-757, the four ASBRs are fully
meshed and run OSPF. ASBR2 may receive the Type 3, Type 5, or Type 7 LSAs generated on
ASBR4. ASBR2 denies the Type 5 or Type 7 LSAs, because the VPN route tags carried in the
LSAs are the same as the default VPN route tag of the OSPF process on ASBR2. If ASBR2 is
disabled from checking the DN bit in the LSAs, ASBR2 accepts the Type 3 LSAs, and routing
loops may occur.
To address the routing loop problem caused by Type 3 LSAs, ASBR2 can be disabled from
checking the DN bit in the Type 3 LSAs generated by devices with router ID 1.1.1.1 and
router ID 3.3.3.3. After the configuration is complete, if ASBR2 receives Type 3 LSAs sent by
ASBR4 with router ID 4.4.4.4, ASBR2 checks the DN bit and denies these Type 3 LSAs
because the DN bit is set to 1.
Figure 1-757 Networking for fully meshed ASBRs in the inter-AS VPN Option A scenario
Multi-VPN-Instance CE
OSPF multi-instance generally runs on PEs. Devices that run OSPF multi-instance within user
LANs are called Multi-VPN-Instance CEs (MCEs).
Compared with OSPF multi-instance running on PEs, MCEs have the following
characteristics:
MCEs do not need to support OSPF-BGP association.
MCEs establish one OSPF instance for each service. Different virtual CEs transmit
different services, which ensures LAN security at a low cost.
MCEs implement different OSPF instances on a CE. The key to implementing MCEs is
to disable loop detection and calculate routes directly. MCEs also need to use the
received LSAs with the ND-bit 1 for route calculation.
Background
As defined in OSPF, stub areas cannot import external routes. This mechanism prevents
external routes from consuming the bandwidth and storage resources of routers in stub areas.
If you need to both import external routes and prevent resource consumption caused by
external routes, you can configure not-so-stubby areas (NSSAs).
There are many similarities between NSSAs and stub areas. However, different from stub
areas, NSSAs can import AS external routes into the OSPF AS and advertise the imported
routes in the OSPF AS without learning external routes from other areas on the OSPF
network.
Related Concepts
N-bit
A router uses the N-bit carried in a Hello packet to identify the area type that it supports.
The same area type must be configured for all routers in an area. If routers have different
area types, they cannot establish OSPF neighbor relationships. Some vendors' devices do
not comply with standard protocols, but the N-bit is also set in OSPF Database
Description (DD) packets. You can manually set the N-bit on a router to interwork with
the vendors' devices.
Type 7 LSA
Type 7 LSAs, which describe imported external routes, are introduced to support NSSAs.
Type 7 LSAs are generated by an ASBR in an NSSA and advertised only within the
NSSA. After an ABR in an NSSA receives Type 7 LSAs, it selectively translates Type 7
LSAs into Type 5 LSAs to advertise external routes to other areas on an OSPF network.
Principles
To advertise external routes imported by an NSSA to other areas, a translator must translate
Type 7 LSAs into Type 5 LSAs. Notes for an NSSA are as follows:
By default, the translator is the ABR with the largest router ID in the NSSA.
The propagate bit (P-bit) is used to notify a translator whether Type 7 LSAs need to be
translated.
Only Type 7 LSAs with the P-bit set and a non-zero forwarding address (FA) can be
translated into Type 5 LSAs. An FA indicates that packets to a destination address will be
forwarded to the address specified by the FA.
FA indicates that the packet to a specific destination address is to be forwarded to the address specified
by.
The loopback interface address in an area is preferentially selected as the FA. If no loopback interface
exists, the address of the interface that is Up and has the largest logical index in the area is selected as
the FA.
Advantages
Multiple ABRs may be deployed in an NSSA. To prevent routing loops caused by default
routes, ABRs do not calculate the default routes advertised by each other.
Background
When multicast and an IGP Shortcut-enabled MPLS TE tunnel are configured on a network,
the outbound interface of the route calculated by an IGP may not be a physical interface but a
TE tunnel interface. The TE tunnel interface on the Device sends multicast Join packets over a
unicast route to the multicast source address. The multicast Join packets are transparent to the
Device through which the TE tunnel passes. As a result, the Device through which the TE
tunnel passes cannot generate multicast forwarding entries.
To resolve the problem, configure OSPF local multicast topology (MT) to create a multicast
routing table for multicast packet forwarding.
Principles
On the network shown in Figure 1-759, multicast and an IGP Shortcut-enabled MPLS TE
tunnel are configured, and the TE tunnel passes through Device B. As a result, Device B
cannot generate multicast forwarding entries.
Since the TE tunnel is unidirectional, multicast data packets from the multicast source are sent
to a physical interface of Device B. Device B discards these packets, because Device B has no
multicast forwarding entry. As a result, services are interrupted.
After local MT is enabled, if the outbound interface of a calculated route is an IGP
Shortcut-enabled TE tunnel interface, the route management (RM) module creates an
independent Multicast IGP (MIGP) routing table for the multicast protocol, calculates a
physical outbound interface for the route, and adds the route to the MIGP routing table.
Multicast packets are then forwarded along this route.
After receiving multicast Join packets from Client, interface 1 on Device A forwards these
packets to Device B. With local MT enabled, Device B can generate multicast forwarding
entries.
Definition
Bidirectional Forwarding Detection (BFD) is a mechanism to detect communication faults
between forwarding engines.
To be specific, BFD detects the connectivity of a data protocol along a path between two
systems. The path can be a physical link, a logical link, or a tunnel.
In BFD for OSPF, a BFD session is associated with OSPF. The BFD session quickly detects a
link fault and then notifies OSPF of the fault, which speeds up OSPF's response to network
topology changes.
Purpose
A link fault or a topology change causes routers to recalculate routes. Routing protocol
convergence must be as quick as possible to improve network availability. Link faults are
inevitable, and therefore a solution must be provided to quickly detect faults and notify
routing protocols.
BFD for Open Shortest Path First (OSPF) associates BFD sessions with OSPF. After BFD for
OSPF is configured, BFD quickly detects link faults and notifies OSPF of the faults. BFD for
OSPF accelerates OSPF response to network topology changes.
Table 1-199 describes OSPF convergence speeds before and after BFD for OSPF is
configured.
Table 1-199 OSPF convergence speeds before and after BFD for OSPF is configured
Principles
Figure 1-760 shows a typical network topology with BFD for OSPF configured. The
principles of BFD for OSPF are described as follows:
1. OSPF neighbor relationships are established between these three routers.
2. After a neighbor relationship becomes Full, a BFD session is established.
3. The outbound interface on Device A connected to Device B is interface 1. If the link
between Device A and Device B fails, BFD detects the fault and then notifies Device A
of the fault.
4. Device A processes the event that a neighbor relationship has become Down and
recalculates routes. The new route passes through Device C and reaches Device A, with
interface 2 as the outbound interface.
Definition
Generalized TTL security mechanism (GTSM) is a mechanism that protects services over the
IP layer by checking whether the TTL value in an IP packet header is within a pre-defined
range.
Purpose
On networks, attackers may simulate OSPF packets and keep sending them to a device. After
receiving these packets, the device directly sends them to the control plane for processing
without checking their validity if the packets are destined for the device. As a result, the
control plane is busy processing these packets, resulting in high CPU usage.
GTSM is used to protect the TCP/IP-based control plane against CPU-utilization attacks, such
as CPU-overload attacks.
Principles
GTSM-enabled devices check the TTL value in each received packet based on a configured
policy. The packets that fail to pass the policy are discarded or sent to the control plane, which
prevents the devices from possible CPU-utilization attacks. A GTSM policy involves the
following items:
Source address of the IP packet sent to the device
VPN instance to which the packet belongs
Protocol number of the IP packet (89 for OSPF, and 6 for BGP)
Source port number and destination port number of protocols above TCP/UDP
Valid TTL range
GTSM is implemented as follows:
For directly connected OSPF neighbors, the TTL value of the unicast protocol packets to
be sent is set to 255.
For multi-hop neighbors, a reasonable TTL range is defined.
The applicability of GTSM is as follows:
GTSM takes effect on unicast packets rather than multicast packets. This is because the
TTL value of multicast packets can only be 255, and therefore GTSM is not needed to
protect against multicast packets.
GTSM does not support tunnel-based neighbors.
Definition
routers periodically send Hello packets through OSPF interfaces. By exchanging Hello
packets, the routers establish and maintain the neighbor relationship, and elect the DR and the
Backup Designated Router (BDR) on the multiple-access network (broadcast or NBMA
network). OSPF uses a Hello timer to control the interval at which Hello packets are sent. A
router can send Hello packets again only after the Hello timer expires. Neighbors keep
waiting to receive Hello packets until the Hello timer expires. This process delays the
establishment of OSPF neighbor relationships or election of the DR and the BDR.
Enabling Smart-discover can solve the preceding problem.
Without Smart-discover Hello packets are sent only when the Hello timer
expires.
Hello packets are sent at the Hello interval.
Neighbors keep waiting to receive Hello packets
within the Dead interval.
With Smart-discover Hello packets are sent directly regardless of
whether the Hello timer expires.
Neighbors can receive packets and change the state
immediately.
Principles
In the following situations, Smart-discover-enabled interfaces can send Hello packets to
neighbors regardless of whether the Hello timer expires:
On broadcast or NBMA networks, neighbor relationships can be established and a DR
and a BDR can be elected rapidly.
− The neighbor status becomes 2-way for the first time or returns to Init from 2-way
or a higher state.
− The interface status of the DR or BDR on a multiple-access network changes.
On P2P or P2MP networks, neighbor relationships can be established rapidly. The
establishment of neighbor relationships on a P2P or P2MP network is the same as that on
a broadcast or NBMA network.
Background
When a new device is deployed on a network or a device is restarted, network traffic may be
lost during BGP route convergence because IGP routes converge more quickly than BGP
routes.
OSPF-BGP synchronization can address this problem.
Purpose
If a backup link exists, BGP traffic may be lost during traffic switchback because BGP routes
converge more slowly than OSPF routes do.
In Figure 1-761, Device A, Device B, Device C, and Device D run OSPF and establish IBGP
connections. Device C functions as the backup of Device B. When the network is stable, BGP
and OSPF routes converge completely on the router.
In most cases, traffic from Device A to 10.3.1.0/30 passes through Device B. If Device B fails,
traffic is switched to Device C. After Device B recovers, traffic is switched back to Device B.
During this process, packet loss occurs.
Consequently, convergence of OSPF routes is complete while BGP route convergence is still
going on. As a result, Device B does not have the route to 10.3.1.0/30.
When packets from Device A to 10.3.1.0/30 reach Device B, Device B discards them because
Device B does not have the route to 10.3.1.0/30.
Principles
If OSPF-BGP synchronization is configured on a device, the device remains as a stub router
during the set synchronization period. During this period, the link metric in the LSA
advertised by the device is set to the maximum value (65535), instructing other OSPF routers
not to use it as a transit router for data forwarding.
In Figure 1-761, OSPF-BGP synchronization is enabled on Router B. In this situation, before
BGP route convergence is complete, Device A keeps forwarding data through Device C rather
than Device B until BGP route convergence on Device B is complete.
Background
LDP-IGP synchronization enables the LDP status and the IGP status to go Up simultaneously,
which helps minimize traffic interruption time if a fault occurs.
A network provides active and standby links for redundancy. If the active link fails, both an
IGP route and an LDP LSP switch from the active link to the standby link. After the active
link recovers, the IGP route switches back to the active link earlier than the LDP LSP. Traffic
therefore switches to the IGP route over the active link but is dropped because the LSP is
unreachable over the new active link. To prevent traffic loss, LDP-IGP synchronization can be
configured.
On a network enabled with LDP-IGP synchronization, an IGP keeps advertising the maximum
cost of an IGP route over the new active link to delay IGP route convergence until LDP
converges. Traffic keeps switching back and forth between the standby and active links. The
backup LSP is torn down after the LSP on the active link is established.
LDP-IGP synchronization involves the following timers:
Hold-max-cost timer
Delay timer
Implementation
In Figure 1-762, a network has both an active and standby link. When the active link
recovers from any fault, traffic is switched from the standby link to the active link.
During the traffic switchback, the backup LSP cannot be used, and a new LSP cannot be
set up over the active link once IGP route convergence is complete. This causes a traffic
interruption for a short period of time. To help prevent this problem, LDP-IGP
synchronization can be configured to delay the IGP route switchback until LDP
converges. The backup LSP is not deleted and continues forwarding traffic until an LSP
over the active link is established. The process of LDP-IGP synchronization is as
follows:
a. A link recovers from a fault.
b. An LDP session is set up between LSR2 and LSR3. The IGP advertises the
maximum cost of the active link to delay the IGP route switchback.
c. Traffic is still forwarded along the backup LSP.
d. Once set up, the LDP session transmits Label Mapping messages and advertises the
IGP to start synchronization.
e. The IGP advertises the normal cost of the active link, and its routes converge on
the original forwarding path. The LSP is reestablished and delivers entries to the
forwarding table.
The whole process usually takes milliseconds.
If an LDP session between two nodes on an active link fails, the LSP on the active link is
torn down, but the IGP route for the active link is reachable. In this case, traffic fails to
switch from the primary LSP to a backup LSP and is discarded. To prevent this problem,
LDP-IGP synchronization can be configured so that after an LDP session fails, LDP
notifies an IGP of the failure. The IGP advertises the maximum cost of the failed link,
which enables the route to switch from the active link to the standby link. In addition to
the LSP switchover from the primary LSP to the backup LSP, LDP-IGP synchronization
is implemented. The process of LDP-IGP synchronization is as follows:
a. An LDP session between two nodes on an active link fails.
b. LDP notifies an IGP of failure in the LDP session over which the primary LSP is
established. The IGP then advertises the maximum cost along the active link.
c. The IGP route switches to the standby link.
d. A backup LSP is set up over the standby link and then forwarding entries are
delivered.
To prevent an LDP session from failing to be established, you can set the value of a
Hold-max-cost timer to always advertise the maximum cost, which enables traffic to be
transmitted along the backup link before the LDP session is reestablished on the active
link.
LDP-IGP synchronization state machine
After LDP-IGP synchronization is enabled on an interface, the LDP-IGP synchronization
state machine operates based on the flowchart shown in Figure 1-763.
When OSPF is used, the status transits based on the flowchart shown in Figure 1-763.
When IS-IS is used, the Hold-normal-cost state does not exist. After the Hold-max-cost timer expires,
IS-IS advertises the actual link cost, but the Hold-max-cost state is displayed even though this state
is nonexistent.
Usage Scenario
Figure 1-764 shows an LDP-IGP synchronization scenario.
On the network shown in Figure 1-764, an active link and a standby link are established.
LDP-IGP synchronization and LDP FRR are deployed.
Benefits
Packet loss is reduced during an active/standby link switchover, improving network
reliability.
Partial Route Calculation (PRC): calculates only the changed routes when the routes on
the network change.
An OSPF intelligent timer: can dynamically adjust its value based on the user's
configuration and the interval at which an event is triggered, such as the route calculation
interval, which ensures rapid and stable network operation.
OSPF intelligent timer uses the exponential backoff technology so that the value of the
timer can reach the millisecond level.
PRC
When a node changes on the network, this algorithm is used to recalculate all routes. The
PRC calculation takes a long time and consumes too many CPU resources, which affects the
convergence speed.
In route calculation, a leaf represents a route, and a node represents a router. Either an SPT or
a leaf change causes a route change. The SPT change is irrelevant to the leaf change. PRC
processes routing information as follows:
If the SPT changes, PRC processes the routing information of all leaves on a changed
node.
If the SPT remains unchanged, PRC does not process the routing information on any
node.
If a leaf changes, PRC processes the routing information on the leaf only.
If a leaf remains unchanged, PRC does not process the routing information on any leaf.
Background
If an interface carrying OSPF services alternates between Up and Down, OSPF neighbor
relationship flapping occurs on the interface. During the flapping, OSPF frequently sends
Hello packets to reestablish the neighbor relationship, synchronizes LSDBs, and recalculates
routes. In this process, a large number of packets are exchanged, adversely affecting neighbor
relationship stability, OSPF services, and other OSPF-dependent services, such as LDP and
BGP. OSPF neighbor relationship flapping suppression can address this problem by delaying
OSPF neighbor relationship reestablishment or preventing service traffic from passing
through flapping links.
Related Concepts
Flapping-event: reported when the status of a neighbor relationship on an interface last
changes from Full to a non-Full state. The flapping-event triggers flapping detection.
Flapping-count: number of times flapping has occurred.
Detecting-interval: detection interval. The interval is used to determine whether to trigger a
valid flapping_event.
Threshold: flapping suppression threshold. When the flapping_count reaches or exceeds
threshold, flapping suppression takes effect.
Resume-interval: interval for exiting from OSPF neighbor relationship flapping suppression.
If the interval between two successive valid flapping_events is longer than resume-interval,
the flapping_count is reset.
Implementation
Flapping detection
Each OSPF interface on which OSPF neighbor relationship flapping suppression is enabled
starts a flapping counter. If the interval between two successive neighbor status changes from
Full to a non-Full state is shorter than detecting-interval, a valid flapping_event is recorded,
and the flapping_count increases by 1. When the flapping_count reaches or exceeds
threshold, flapping suppression takes effect. If the interval between two successive neighbor
status changes from Full to a non-Full state is longer than resume-interval, the
flapping_count is reset.
The detecting-interval, threshold, and resume-interval are configurable.
Flapping suppression
Flapping suppression works in either Hold-down or Hold-max-cost mode.
Hold-down mode: In the case of frequent flooding and topology changes during neighbor
relationship establishment, interfaces prevent neighbor relationship reestablishment
during Hold-down suppression, which minimizes LSDB synchronization attempts and
packet exchanges.
Hold-max-cost mode: If the traffic forwarding path changes frequently, interfaces use
65535 as the cost of the flapping link during Hold-max-cost suppression, which prevents
traffic from passing through the flapping link.
Flapping suppression can also work first in Hold-down mode and then in Hold-max-cost
mode.
By default, the Hold-max-cost mode takes effect. The mode and suppression duration can be
changed manually.
If an attack causes frequent neighbor relationship flapping, Hold-down mode can minimize
the impact of the attack.
When an interface enters the flapping suppression state, all neighbor relationships on the interface enter
the state accordingly.
Typical Scenarios
Basic scenario
In Figure 1-765, the traffic forwarding path is Device A -> Device B -> Device C -> Device E
before a link failure occurs. After the link between Device B and Device C fails, the
forwarding path switches to Device A -> Device B -> Device D -> Device E. If the neighbor
relationship between Device B and Device C frequently flaps at the early stage of the path
switchover, the forwarding path will be switched frequently, causing traffic loss and affecting
network stability. If the neighbor relationship flapping meets suppression conditions, flapping
suppression takes effect.
If flapping suppression works in Hold-down mode, the neighbor relationship between
Device B and Device C is prevented from being reestablished during the suppression
period, in which traffic is forwarded along the path Device A -> Device B -> Device D
-> Device E.
If flapping suppression works in Hold-max-cost mode, 65535 is used as the cost of the
link between Device B and Device C during the suppression period, and traffic is
forwarded along the path Device A -> Device B -> Device D -> Device E.
Broadcast scenario
In Figure 1-767, four devices are deployed on the same broadcast network using switches, and
the devices are broadcast network neighbors. If Device C flaps due to a link failure, and
Device A and Device B were deployed at different time (Device A was deployed earlier for
example) or the flapping suppression parameters on Device A and Device B are different,
Device A first detects the flapping and suppresses Device C. Consequently, the Hello packets
sent by Device A do not carry Device C's router ID. However, Device B has not detected the
flapping yet and still considers Device C a valid node. As a result, the DR candidates
identified by Device A are Device B and Device D, whereas the DR candidates identified by
Device B are Device A, Device C, and Device D. Different DR candidates result in a different
DR election result, which may lead to route calculation errors. To prevent this problem in
scenarios where an interface has multiple neighbors, such as on a broadcast, P2MP, or NBMA
network, all neighbors on the interface are suppressed when the status of a neighbor
relationship last changes to ExStart or Down. Specifically, if Device C flaps, Device A,
Device B, and Device D on the broadcast network are all suppressed. After the network
stabilizes and the suppression timer expires, Device A, Device B, and Device D are restored
to normal status.
Multi-area scenario
In Figure 1-768, Device A, Device B, Device C, Device E, and Device F are connected in area
1, and Device B, Device D, and Device E are connected in backbone area 0. Traffic from
Device A to Device F is preferentially forwarded along an intra-area route, and the forwarding
path is Device A -> Device B -> Device C -> Device E -> Device F. When the neighbor
relationship between Device B and Device C flaps and the flapping meets suppression
conditions, flapping suppression takes effect in the default mode (Hold-max-cost).
Consequently, 65535 is used as the cost of the link between Device B and Device C. However,
the forwarding path remains unchanged because intra-area routes take precedence over
inter-area routes during route selection according to OSPF route selection rules. To prevent
traffic loss in multi-area scenarios, configure Hold-down mode to prevent the neighbor
relationship between Device B and Device C from being reestablished during the suppression
period. During this period, traffic is forwarded along the path Device A -> Device B ->
Device D -> Device E -> Device F.
By default, the Hold-max-cost mode takes effect. The mode can be changed to Hold-down manually.
Scenario with both LDP-IGP synchronization and OSPF neighbor relationship flapping
suppression configured
In Figure 1-769, if the link between PE1 and P1 fails, an LDP LSP switchover is implemented
immediately, causing the original LDP LSP to be deleted before a new LDP LSP is established.
To prevent traffic loss, LDP-IGP synchronization needs to be configured. With LDP-IGP
synchronization, 65535 is used as the cost of the new LSP to be established. After the new
LSP is established, the original cost takes effect. Consequently, the original LSP is deleted,
and LDP traffic is forwarded along the new LSP.
LDP-IGP synchronization and OSPF neighbor relationship flapping suppression work in
either Hold-down or Hold-max-cost mode. If both functions are configured, Hold-down mode
takes precedence over Hold-max-cost mode, followed by the configured link cost. Table
1-201 lists the suppression modes that take effect in different situations.
Table 1-201 Principles for selecting the suppression modes that take effect in different situations
LDP-IGP LDP-IGP LDP-IGP Exited from
Synchronization/OS Synchronization Synchronization LDP-IGP
PF Neighbor Hold-down Mode Hold-max-cost Synchronization
Relationship Mode Suppression
Flapping
Suppression Mode
For example, the link between PE1 and P1 frequently flaps in Figure 1-769, and both
LDP-IGP synchronization and OSPF neighbor relationship flapping suppression are
configured. In this case, the suppression mode is selected based on the preceding principles.
No matter which mode (Hold-down or Hold-max-cost) is selected, the forwarding path is PE1
-> P4 -> P3 -> PE2.
Figure 1-769 Scenario with both LDP-IGP synchronization and OSPF neighbor relationship
flapping suppression configured
Background
In OSPF, intra-area links take precedence over inter-area links during route selection even
when the inter-area links are shorter than the intra-area links. Each OSPF interface belongs to
only one area. As a result, even when a high-speed link exists in an area, traffic of another
area cannot be forwarded along the link. A common method used to solve this problem is to
configure multiple sub-interfaces and add them to different areas. However, this method has a
defect that an independent IP address needs to be configured for each sub-interface and then is
advertised, which increases the total number of routes. In this situation, OSPF multi-area
adjacency is introduced.
OSPF multi-area adjacency allows an OSPF interface to be multiplexed by multiple areas so
that a link can be shared by the areas.
Figure 1-770 Traffic forwarding paths before and after OSPF multi-area adjacency is enabled
In Figure 1-770, the link between Device A and Device B in area 1 is a high-speed link.
In Figure 1-770 a, OSPF multi-area adjacency is disabled on Device A and Device B, and
traffic from Device A to Device B in area 2 is forwarded along the low-speed link of Device A
-> Device C -> Device D -> Device B.
In Figure 1-770 b, OSPF multi-area adjacency is enabled on Device A and Device B, and their
multi-area adjacency interfaces belong to area 2. In this case, traffic from Device A to Device
B in area 2 is forwarded along the high-speed link of Device A -> Device B.
OSPF multi-area adjacency has the following advantages:
Allows interface multiplexing, which reduces OSPF interface resource usage in
multi-area scenarios.
Allows link multiplexing, which prevents a traffic detour to low-speed links and
optimizes the OSPF network.
Related Concepts
Multi-area adjacency interface: indicates the OSPF logical interface created when OSPF
multi-area adjacency is enabled on an OSPF-capable interface (main OSPF interface). The
multi-area adjacency interface is also referred to as a secondary OSPF interface. The
multi-area adjacency interface has the following characteristics:
The multi-area adjacency interface and the main OSPF interface belong to different
OSPF areas.
The network type of the multi-area adjacency interface must be P2P. The multi-area
adjacency interface runs an independent interface state machine and neighbor state
machine.
The multi-area adjacency interface and the main OSPF interface share the same interface
index and packet transmission channel. Whether the multi-area adjacency interface or the
main OSPF interface is selected to forward an OSPF packet is determined by the area ID
carried in the packet header and related configuration.
If the interface is P2P, its multi-area adjacency interface sends packets through multicast.
If the interface is not P2P, its multi-area adjacency interface sends packets through
unicast.
Principles
In Figure 1-771, the link between Device A and Device B in area 1 is a high-speed link. In
area 2, traffic from Device A to Device B is forwarded along the low-speed link of Device A
-> Device C -> Device D -> Device B. If you want the traffic from Device A to Device B in
area 2 to be forwarded along the high-speed link of Device A -> Device B, deploy OSPF
multi-area adjacency.
Specifically, configure OSPF multi-area adjacency on the main interfaces of Device A and
Device B to create multi-area adjacency interfaces. The multi-area adjacency interfaces
belong to area 2.
1. An OSPF adjacency is established between Device A and Device B. For details about the
establishment process, see 1.10.6.2.2 Basic Principles.
2. Route calculation is implemented. For details about the calculation process, see
1.10.6.2.2 Basic Principles.
The optimal path in area 2 obtained by OSPF through calculation is the high-speed link of
Device A -> Device B. In this case, the high-speed link is shared by area 1 and area 2.
Background
As networks develop, voice over IP (VoIP) and online video services pose higher
requirements for real-time transmission. Nevertheless, if a primary link fails, OSPF-enabled
devices need to perform multiple operations, including detecting the fault, updating the
link-state advertisement (LSA), flooding the LSA, calculating routes, and delivering forward
information base (FIB) entries before switching traffic to a new link. This process takes a
much longer time, the minimum delay to which users are sensitive. As a result, the
requirements for real-time transmission cannot be met. OSPF IP FRR can solve this problem.
OSPF IP FRR conforms to dynamic IP FRR defined by standard protocols. With OSPF IP
FRR, devices can switch traffic from a faulty primary link to a backup link, protecting against
a link or node failure.
Major Auto FRR techniques include loop-free alternate (LFA), U-turn, Not-Via, Remote LFA,
and MRT, among which OSPF supports only LFA and Remote LFA.
Related Concepts
OSPF IP FRR
OSPF IP FRR refers to a mechanism in which a device uses the loop-free alternate (LFA)
algorithm to compute the next hop of a backup link and stores the next hop together with the
primary link in the forwarding table. If the primary link fails, the device switches the traffic to
the backup link before routes are converged on the control plane. This mechanism keeps the
traffic interruption duration and minimizes the impacts.
OSPF IP FRR policy
An OSPF IP FRR policy can be configured to filter alternate next hops. Only the alternate
next hops that match the filtering rules of the policy can be added to the IP routing table.
LFA algorithm
A device uses the shortest path first (SPF) algorithm to calculate the shortest path from each
neighbor with a backup link to the destination node. The device then uses the inequalities
defined in standard protocols and the LFA algorithm to calculate the next hop of the loop-free
backup link that has the smallest cost of the available shortest paths.
Remote LFA
LFA Auto FRR cannot be used to calculate alternate links on large-scale networks, especially
on ring networks. Remote LFA Auto FRR addresses this problem by calculating a PQ node
and establishing a tunnel between the source node of a primary link and the PQ node. If the
primary link fails, traffic can be automatically switched to the tunnel, which improves
network reliability.
P space
P space consists of the nodes through which the shortest path trees (SPTs) with the source
node of a primary link as the root are reachable without passing through the primary link.
Extended P space
Extended P space consists of the nodes through which the SPTs with neighbors of a primary
link's source node as the root are reachable without passing through the primary link.
Q space
Q space consists of the nodes through which the SPTs with the destination node of a primary
link as the root are reachable without passing through the primary link.
PQ node
A PQ node exists both in the extended P space and Q space and is used by Remote LFA as the
destination of a protection tunnel.
Distance_opt(X, Y) indicates the shortest link from X to Y. S stands for a source node, E for the faulty
node, N for a node along a backup link, and D for a destination node.
Node-and-link protection
Node-and-link protection takes effect when the traffic to be protected.
In Figure 1-773, traffic flows from Device S to Device D. The primary link is Device
S->Device E->Device D, and the backup link is Device S->Device N->Device D. The
preceding inequalities are met. With OSPF IP FRR, Device S switches the traffic to the
backup link if the primary link fails, keeping the traffic interruption duration.
The traffic to be protected flows along a specified link and node and the following conditions
are met:
The link costs meet the inequality: Distance_opt(N, D) < Distance_opt(N, S) +
Distance_opt(S, D).
The interface costs meet the inequality: Distance_opt(N, D) < Distance_opt(N, E) +
Distance_opt(E, D).
Distance_opt(X, Y) indicates the shortest link from X to Y. S stands for a source node, E for the faulty
node, N for a node along a backup link, and D for a destination node.
On the network shown in Figure 1-774, Remote LFA calculates the PQ node as follows:
1. Calculates the SPTs with all neighbors of P1 as roots. The nodes through which the SPTs
are reachable without passing through the primary link form an extended P space. The
extended P space in this example is {PE1, P1, P3, P4}.
2. Calculates the SPTs with P2 as the root and obtains the Q space {PE2, P4}.
3. Selects the PQ node (P4) that exists both in the extended P space and Q space.
OSPF anti-microloop
In Figure 1-774, OSPF remote LFA FRR is enabled, the primary link is PE1 -> P1 -> P2 ->
PE2, and the backup link is PE1 -> P1 -> P3 -> P4 -> P2 -> PE2, and the link P1 -> P3 -> P4
is an LDP tunnel. If the primary link fails, traffic is switched to the backup link, and then
another round of the new primary link calculation begins. Specifically, after P1 completes
route convergence, its next hop becomes P3. However, the route convergence on P3 is slower
than that on P1, and P3's next hop is still P1. As a result, a temporary loop occurs between P1
and P3. OSPF anti-microloop can address this problem by delaying P1 from switching its next
hop until the next hop of P3 becomes P4. Then traffic is switched to the new primary link
(PE1 -> P1 -> P3 -> P4 -> P2 -> PE2), and on the link P1 -> P3 -> P4, traffic is forwarded
based on IP routes.
In a multi-source routing scenario, OSPF FRR is implemented by calculating the Type 3 LSAs
advertised by ABRs of an area for intra-area, inter-area, ASE, or NSSA routing. Inter-area routing is
used as an example to describe how OSPF FRR in a multi-source routing scenario works.
In Figure 1-775, Device B and Device C function as ABRs to forward area 0 and area 1 routes.
Device E advertises an intra-area route. Upon receipt of the route, Device B and Device C
translate it to a Type 3 LSA and flood the LSA to area 0. After OSPF FRR is enabled on
Device A, Device A considers Device B and Device C as its neighbors. Without a fixed
neighbor as the root node, Device A fails to calculate FRR backup next hop. To address this
problem, a virtual node is simulated between Device B and Device C and used as the root
node of Device A, and Device A uses the LFA or remote LFA algorithm to calculate the
backup next hop. This solution converts multi-source routing into single-source routing.
For example, both Device B and Device C advertise the 100.1.1.0/24 route. After Device A
receives the route, it fails to calculate a backup next hop for the route due to a lack of a fixed
root node. To address this problem, a virtual node is simulated between Device B and Device
C and used as the root node of Device A. The cost of the Device B-virtual node link is 0, and
the cost of the Device C-virtual node link is 5. The cost of the virtual node-Device B or
Device C link is the maximum value (65535). If the virtual node advertises the 100.1.1.0/24
route, it will use the smaller cost of the routes advertised by Device B and Device C as the
cost of the route. Device A is configured to consider Device B and Device C as invalid
sources of the 100.1.1.0/24 route and use the LFA or remote LFA algorithm to calculate the
backup next hop for the route, with the virtual node as the root node.
In a multi-source routing scenario, OSPF FRR can use the LFA or remote LFA algorithm.
When OSPF FRR uses the remote LFA algorithm, PQ node selection has the following
restrictions:
An LDP LSP will be established between a faulty node and a PQ node, and a virtual
node in a multi-source routing scenario cannot transmit traffic through LDP LSPs. As a
result, the virtual node cannot be selected as a PQ node.
The destination node is not used as a PQ node. After a virtual node is added to a
multi-source routing scenario, the destination node becomes the virtual node. As a result,
the nodes directly connected to the virtual node cannot be selected as PQ nodes.
Derivative Functions
If you bind a Bidirectional Forwarding Detection (BFD) session with OSPF IP FRR, the BFD
session goes Down if BFD detects a link fault. If the BFD session goes Down, OSPF IP FRR
is triggered to switch traffic from the faulty link to the backup link, which minimizes the loss
of traffic.
1: simple authentication
2: Ciphertext authentication
Usage Scenario
MD5 authentication data is added to an OSPF packet and is not included in the Authentication field.
Hello Packet
Hello packets are commonly used packets, which are periodically sent by OSPF interfaces to
establish and maintain neighbor relationships. A Hello packet includes information about the
designated router (DR), backup designated router (BDR), timers, and known neighbors.
Figure 1-778 shows the format of a Hello packet.
RouterDeadI 32 bits Dead interval. If a router does not receive any Hello packets
nterval from its neighbors within a specified dead interval, the
neighbors are considered Down.
Designated 32 bits Interface address of the DR.
Router
Backup 32 bits Interface address of the BDR.
Designated
Router
Neighbor 32 bits Router ID of the neighbor.
Table 1-204 lists the address types, interval types, and default intervals used when Hello
packets are transmitted on different networks.
To establish neighbor relationships between routers on the same network segment, set the same
HelloInterval, PollInterval, and RouterDeadInterval values for the routers. PollInterval applies only to
NBMA networks.
DD Packet
During an adjacency initialization, two routers use DD packets to describe their own link state
databases (LSDBs) for LSDB synchronization. A DD packet contains the header of each LSA
in an LSDB. An LSA header uniquely identifies an LSA. The LSA header occupies only a
small portion of the LSA, which reduces the amount of traffic transmitted between routers. A
neighbor can use the LSA header to check whether it already has the LSA. When two routers
exchange DD packets, one functions as the master, and the other functions as the slave. The
master defines a start sequence number and increases the sequence number by one each time
it sends a DD packet. After the slave receives a DD packet, it uses the sequence number
carried in the DD packet for acknowledgement.
Figure 1-779 shows the format of a DD packet.
LSR Packet
After two routers exchange DD packets, they send LSR packets to request each other's LSAs.
The LSR packets contain the summaries of the requested LSAs. Figure 1-780 shows the
format of an LSR packet.
The LS type, Link State ID, and Advertising Router fields can uniquely identify an LSA. If two LSAs
have the same LS type, Link State ID, and Advertising Router fields, a router uses the LS sequence
number, LS checksum, and LS age fields to obtain a required LSA.
LSU Packet
A router uses an LSU packet to transmit LSAs requested by its neighbors or to flood its own
updated LSAs. The LSU packet contains a set of LSAs. For multicast and broadcast networks,
LSU packets are multicast to flood LSAs. To ensure reliable LSA flooding, a router uses an
LSAck packet to acknowledge the LSAs contained in an LSU packet that is received from a
neighbor. If an LSA fails to be acknowledged, the router retransmits the LSA to the neighbor.
Figure 1-781 shows the format of an LSU packet.
LSAck Packet
A router uses an LSAck packet to acknowledge the LSAs contained in a received LSU packet.
The LSAs can be acknowledged using LSA headers. LSAck packets can be transmitted over
different links in unicast or multicast mode. Figure 1-782 shows the format of an LSAck
packet.
Router-LSA
A router-LSA describes the link status and cost of a router. Router-LSAs are generated by a
router and advertised within the area to which the router belongs. Figure 1-784 shows the
format of a router-LSA.
Network-LSA
A network-LSA describes the link status of all routers on the local network segment.
Network-LSAs are generated by a DR on a broadcast or non-broadcast multiple access
(NBMA) network and advertised within the area to which the DR belongs. Figure 1-785
shows the format of a network-LSA.
Summary-LSA
A network-summary-LSA describes routes on a network segment in an area. The routes are
advertised to other areas.
An ASBR-summary-LSA describes routes to the ASBR in an area. The routes are advertised
to all areas except the area to which the ASBR belongs.
The two types of summary-LSAs have the same format and are generated by an ABR. Figure
1-786 shows the format of a summary-LSA.
When a default route is advertised, both the Link State ID and Network Mask fields are set to 0.0.0.0.
AS-External-LSA
An AS-external-LSA describes AS external routes. AS-external-LSAs are generated by an
ASBR. Among the five types of LSAs, only AS-external-LSAs can be advertised to all areas
except stub areas and not-so-stubby areas (NSSAs). Figure 1-787 shows the format of an
AS-external-LSA.
When AS-external-LSAs are used to advertise default routes, both the Link State ID and Network Mask
fields are set to 0.0.0.0.
1.10.6.3 Applications
1.10.6.3.1 OSPF GTSM
In Figure 1-788, OSPF runs between the routers, and GTSM is enabled on Device C. The
following are the valid TTL ranges of the packets from all other routers on the network to
Device C:
Device A and Device E are the neighbors of Device C, and the valid TTL range of
packets from Device A and Device E is [255 - hop count + 1, 255].
The valid TTL ranges of the packets from Device B, Device D, and Device F to Device
C are [254, 255], [253, 255], and [252, 255], respectively.
For detailed description of OSPF GTSM, refer to the HUAWEI NE20E-S2 Feature Description -
Security.
1.10.7 OSPFv3
1.10.7.1 Introduction
Definition
Open Shortest Path First (OSPF), developed by the Internet Engineering Task Force (IETF), is
a link-state Interior Gateway Protocol (IGP).
At present, OSPF Version 2 (OSPFv2) is used for IPv4, while OSPF Version 3 (OSPFv3),
developed on the basis of OSPFv2, is used for IPv6.
Purpose
The primary purpose of OSPFv3 is to develop a routing protocol independent of any specific
network layer. The internal OSPFv3 router information is redesigned to achieve this purpose.
1.10.7.2 Principles
1.10.7.2.1 OSPFv3 Fundamentals
Running on IPv6, OSPFv3 is an independent routing protocol that is developed on the basis of
OSPFv2.
OSPFv3 and OSPFv2 are the same in terms of the working principles of the Hello packet,
state machine, link-state database (LSDB), flooding, and route calculation.
OSPFv3 packets are encapsulated into IPv6 packets and can be transmitted in unicast or
multicast mode.
LSA Types
LSA Type Description
Router-LSA (Type 1) Describes the link status and link cost of a router. It is
generated by every router and advertised in the area to
which the router belongs.
Network-LSA (Type 2) Describes the link status of all routers on the local network
segment. Network-LSAs are generated by a designated
router (DR) and advertised in the area to which the DR
belongs.
Inter-Area-Prefix-LSA (Type Describes routes to a specific network segment in an area.
3) Inter-Area-Prefix-LSAs are generated on the Area Border
Router (ABR) and sent to related areas.
Inter-Area-Router-LSA Describes routes to an Autonomous System Boundary
(Type 4) Router (ASBR). Inter-Area-Router-LSAs are generated by
an ABR and advertised to all related areas except the area
to which the ASBR belongs.
AS-external-LSA (Type 5) Describes routes to a destination outside the AS.
AS-external-LSAs are generated by an ASBR and
advertised to all areas except stub areas.
Link-LSA (Type 8) Describes the link-local address and IPv6 address prefix
associated with the link and the link option set in the
Router Types
Area
When a large number of routers run OSPFv3, LSDBs become very large and require a large
amount of storage space. Large LSDBs also complicate shortest path first (SPF) computation
and are computationally intensive for the routers. Network expansion causes the network
topology to change, which results in route flapping and frequent OSPFv3 packet transmission.
When a large number of OSPFv3 packets are transmitted on the network, bandwidth usage
efficiency decreases. Each change in the network topology causes all routers on the network
to recalculate routes.
OSPFv3 resolves this problem by partitioning an AS into different areas. An area is regarded
as a logical group, and each group is identified by an area ID. A router, not a link, resides at
the border of an area. A network segment or link can belong only to one area. An area must be
specified for each OSPFv3 interface.
OSPFv3 areas include common areas, stub areas, as described in Table 1-217.
Stub Area
Stub areas are specific areas where ABRs do not flood received AS external routes. In stub
areas, routers maintain fewer routing entries and less routing information than the routers in
other areas.
Configuring a stub area is optional. Not every area can be configured as a stub area, because a
stub area is usually a non-backbone area with only one ABR and is located at the AS border.
To ensure the reachability of the routes to destinations outside an AS, the ABR in the stub area
generates a default route and advertises the route to the non-ABRs in the same stub area.
Note the following points when configuring a stub area:
The backbone area cannot be configured as a stub area.
Configure stub area attributes on all routers in the area to be configured as a stub area.
No ASBRs are allowed in the area to be configured as a stub area because AS external
routes cannot be transmitted in the stub area.
OSPFv3 Multi-process
OSPFv3 supports multi-process. Multiple OSPFv3 processes can independently run on the
same router. Route exchange between different OSPFv3 processes is similar to that between
different routing protocols.
Definition
Bidirectional Forwarding Detection (BFD) is a mechanism to detect communication faults
between forwarding engines.
To be specific, BFD detects the connectivity of a data protocol along a path between two
systems. The path can be a physical link, a logical link, or a tunnel.
In BFD for OSPFv3, a BFD session is associated with OSPFv3. The BFD session quickly
detects a link fault and then notifies OSPFv3 of the fault, which speeds up OSPFv3's response
to network topology changes.
Purpose
A link fault or a topology change causes routers to recalculate routes. Routing protocol
convergence must be as quick as possible to improve network availability. Link faults are
inevitable, and therefore a solution must be provided to quickly detect faults and notify
routing protocols.
BFD for Open Shortest Path First version 3 (OSPFv3) associates BFD sessions with OSPFv3.
After BFD for OSPFv3 is configured, BFD quickly detects link faults and notifies OSPFv3 of
the faults. BFD for OSPFv3 accelerates OSPFv3 response to network topology changes.
Table 1-219 describes OSPFv3 convergence speeds before and after BFD for OSPFv3 is
configured.
Table 1-219 OSPFv3 convergence speeds before and after BFD for OSPFv3 is configured
Principles
Figure 1-790 shows a typical network topology with BFD for OSPFv3 configured. The
principles of BFD for OSPFv3 are described as follows:
1. OSPFv3 neighbor relationships are established between these three routers.
2. After a neighbor relationship becomes Full, a BFD session is established.
3. The outbound interface on Device A connected to Device B is interface 1. If the link
between Device A and Device B fails, BFD detects the fault and then notifies Device A
of the fault.
4. Device A processes the event that a neighbor relationship has become Down and
recalculates routes. The new route passes through Device C and reaches Device B, with
interface 2 as the outbound interface.
Background
As networks develop, voice over IP (VoIP) and online video services pose higher
requirements for real-time transmission. Nevertheless, if a primary link fails, OSPFv3-enabled
devices need to perform multiple operations, including detecting the fault, updating the
link-state advertisement (LSA), flooding the LSA, calculating routes, and delivering forward
information base (FIB) entries before switching traffic to a new link. This process takes a
much longer time, the minimum delay to which users are sensitive. As a result, the
requirements for real-time transmission cannot be met.
Principles
With OSPFv3 IP FRR, a device uses the loop-free alternate (LFA) algorithm to compute the
next hop of a backup link and stores the next hop together with the primary link in the
forwarding table. If the primary link fails, the device switches the traffic to the backup link
before routes are converged on the control plane. This mechanism keeps the traffic
interruption duration and minimizes the impacts. The NE20E supports OSPFv3 Auto FRR.
A device uses shortest path first (SPF) algorithm to calculate the shortest path from each
neighbor that can provide a backup link to the destination node. The device then uses the
inequalities defined in standard protocols and the LFA algorithm to calculate the next hop of
the loop-free backup link that has the smallest cost of the available shortest paths.
An OSPFv3 Auto FRR policy is used to filter alternate next hops. Only the alternate next hops
that match the filtering rules of the policy can be added to the IP routing table. Users can
configure a desired OSPF IP FRR policy to filter alternate next hops.
If a Bidirectional Forwarding Detection (BFD) session is bound to OSPFv3 Auto FRR, the
BFD session goes Down if BFD detects a link fault. If the BFD session goes Down, OSPFv3
Auto FRR is triggered on the interface to switch traffic from the faulty link to the backup link,
which minimizes the loss of traffic.
Usage Scenario
OSPFv3 Auto FRR guarantees protection against either a link failure or a node-and-link
failure. Distance_opt (X,Y) indicates the shortest path between node X and node Y.
Link protection: Link protection takes effect when the traffic to be protected flows
along a specified link and the link costs meet the inequality: Distance_opt (N, D) <
Distance_opt (N, S) + Distance_opt (S, D).
− S: source node
− N: node along a backup link
− D: destination node
On the network shown in Figure 1-791, traffic flows from Device S to Device D. The
link cost satisfies the link protection inequality. If the primary link (Device S -> Device
E -> Device D) fails, Device S switches the traffic to the backup link (Device S ->
Device N -> Device E -> Device D), keeping the traffic interruption duration.
In a multi-source routing scenario, OSPFv3 FRR is implemented by calculating the Type 3 LSAs
advertised by ABRs of an area for intra-area, inter-area, ASE routing. Inter-area routing is used as an
example to describe how OSPFv3 FRR in a multi-source routing scenario works.
In Figure 1-793, Device B and Device C function as ABRs to forward area 0 and area 1 routes.
Device E advertises an intra-area route. Upon receipt of the route, Device B and Device C
translate it to a Type 3 LSA and flood the LSA to area 0. After OSPFv3 FRR is enabled on
Device A, Device A considers Device B and Device C as its neighbors. Without a fixed
neighbor as the root node, Device A fails to calculate FRR backup next hop. To address this
problem, a virtual node is simulated between Device B and Device C and used as the root
node of Device A, and Device A uses the LFA algorithm to calculate the backup next hop.
This solution converts multi-source routing into single-source routing.
For example, both Device B and Device C advertise the 2001:DB8:1::1/64 route. After Device
A receives the route, it fails to calculate a backup next hop for the route due to a lack of a
fixed root node. To address this problem, a virtual node is simulated between Device B and
Device C and used as the root node of Device A. The cost of the Device B-virtual node link is
0, and the cost of the Device C-virtual node link is 5. The cost of the virtual node-Device B or
Device C link is the maximum value (65535). If the virtual node advertises the
2001:DB8:1::1/64 route, it will use the smaller cost of the routes advertised by Device B and
Device C as the cost of the route. Device A is configured to consider Device B and Device C
as invalid sources of the 2001:DB8:1::1/64 route and use the LFA algorithm to calculate the
backup next hop for the route, with the virtual node as the root node.
1.10.7.2.6 OSPFv3 GR
Graceful restart (GR) is a technology used to ensure proper traffic forwarding, especially the
forwarding of key services, during the restart of routing protocols.
Without GR, the master/slave main control board switchover due to various reasons leads to
transient service interruption, and as a result, route flapping occurs on the whole network.
Such route flapping and service interruption are unacceptable on large-scale networks,
especially carrier networks.
Table 1-220 Comparison between master/slave main control board switchovers with and without
GR
Master/Slave Main Control Board Master/Slave Main Control Board
Switchovers Without GR Switchovers with GR
OSPFv3 neighbor relationships are OSPFv3 neighbor relationships are
reestablished. reestablished.
Routes are recalculated. Routes are recalculated.
FIB entries change. FIB entries remain unchanged.
The entire network detects route Except the neighbors of the router on
changes, and route flapping occurs for which a master/slave main control board
a short period of time. switchover occurs, other routers do not
Packets are lost during forwarding, detect route changes.
and services are interrupted. No packets are lost during forwarding, and
services are not affected.
Definition
As an extension to OSPFv3, OSPFv3 VPN multi-instance enables Provider Edges (PEs) and
Customer Edges (CEs) in VPN networks to run OSPFv3 for interworking and use OSPFv3 to
learn and advertise routes.
Purpose
As a widely used IGP, in most cases, OSPFv3 runs in VPNs. If OSPFv3 runs between PEs
and CEs, and PEs use OSPFv3 to advertise VPN routes to CEs, no other routing protocols
need to be configured on CEs for interworking with PEs, which simplifies management and
configuration of CEs.
Running OSPFv3 between PEs and CEs features the following benefits:
OSPFv3 is used in a site to learn routes. Running OSPFv3 between PEs and CEs can
reduce the number of protocol types supported by CEs.
Similarly, running OSPFv3 both in a site and between PEs and CEs simplifies the work
of network administrators and reduces the number of protocols that network
administrators must be familiar with.
When the network, which originally uses OSPFv3 but not VPN on the backbone network
begins to use BGP/MPLS VPN, running OSPFv3 between PEs and CEs facilitates the
transition.
In Figure 1-794, CE1, CE3, and CE4 belong to VPN 1, and the numbers following OSPFv3
indicate the process IDs of the multiple OSPFv3 instances running on PEs.
OSPFv3 Domain ID
If inter-area routes are advertised between local and remote OSPFv3 areas, these areas are
considered to be in the same OSPFv3 domain.
Domain IDs identify domains.
Each OSPFv3 domain has one or more domain IDs. If more than one domain ID is
available, one of the domain IDs is a primary ID, and the others are secondary IDs.
If an OSPFv3 instance does not have specific domain IDs, its ID is considered as null.
Before advertising the remote routes sent by BGP to CEs, PEs need to determine the type of
OSPFv3 routes (Type 3 or Type 5) to be advertised to CEs based on the domain IDs.
If local domain IDs are the same as or compatible with remote domain IDs in BGP
routes, PEs advertise Type 3 routes.
If local domain IDs are different from or incompatible with remote domain IDs in BGP
routes, PEs advertise Type 5 or Type 7 routes.
In Figure 1-795, on PE1, OSPFv3 imports a BGP route destined for 2001:db8:1::1/64 and
then generates and advertises a Type 5 or Type 7 LSA to CE1. Then, CE1 learns an OSPFv3
route with 2001:db8:1::1/64 as the destination address and PE1 as the next hop and advertises
the route to PE2. Therefore, PE2 learns an OSPFv3 route with 2001:db8:1::1/64 as the
destination address and CE1 as the next hop.
Similarly, CE1 also learns an OSPFv3 route with 2001:db8:1::1/64 as the destination address
and PE2 as the next hop. PE1 learns an OSPF route with 2001:db8:1::1/64 as the destination
address and CE1 as the next hop.
As a result, CE1 has two equal-cost routes with PE1 and PE2 as next hops respectively, and
the next hops of the routes from PE1 and PE2 to 2001:db8:1::1/64 are CE1, which leads to a
routing loop.
In addition, the priority of an OSPFv3 route is higher than that of a BGP route. Therefore, on
PE1 and PE2, BGP routes to 2001:db8:1::1/64 are replaced with the OSPFv3 route, and the
OSPFv3 route with 2001:db8:1::1/64 as the destination address and CE1 as the next hop is
active in the routing tables of PE1 and PE2.
The BGP route is inactive, and therefore, the LSA generated when this route is imported by
OSPFv3 is deleted, which causes the OSPFv3 route to be withdrawn. As a result, no OSPFv3
route exists in the routing table, and the BGP route becomes active again. This cycle causes
route flapping.
OSPFv3 VPN provides a few solutions to routing loops, as described Table 1-222.
Multi-VPN-Instance CE
OSPFv3 multi-instance generally runs on PEs. Devices that run OSPFv3 multi-instance
within user LANs are called Multi-VPN-Instance CEs (MCEs).
Compared with OSPFv3 multi-instance running on PEs, MCEs have the following
characteristics:
MCEs do not need to support OSPFv3-BGP association.
MCEs establish one OSPFv3 instance for each service. Different virtual CEs transmit
different services, which ensures LAN security at a low cost.
MCEs implement different OSPFv3 instances on a CE. The key to implementing MCEs
is to disable loop detection and calculate routes directly. MCEs also use the received
LSAs with the DN-bit 1 for route calculation.
If Device C fails, traffic is switched to Device B after rerouting. Packets are lost when Device
C recovers.
Because OSPFv3 route convergence is faster than BGP route convergence, OSPFv3
convergence is complete while BGP route convergence is still going on when Device C
recovers. The next hop of the route from Device A to Device D is Device C, which, however,
does not know the route to Device D since BGP convergence on Device C is not complete.
Therefore, Device C discards the packets destined for Device D after receiving them from
Device A, as shown in Figure 1-797.
Figure 1-797 Packet loss during a device restart without OSPFv3-BGP association
IPsec adopts two security protocols: Authentication Header (AH) security and Encapsulating
Security Payload (ESP):
AH: A protocol that provides data origin authentication, data integrity check, and
anti-replay protection. AH does not encrypt packets to be protected.
AH data is carried in the following fields:
− IP version
− Header length
− Packet length
− Identification
− Protocol
− Source and destination addresses
− Options
ESP: A protocol that provides IP packet encryption and authentication mechanisms
besides the functions provided by AH. The encryption and authentication mechanisms
can be used together or independently.
2: ciphertext authentication
Background
If the status of an interface carrying OSPFv3 services alternates between Up and Down,
OSPFv3 neighbor relationship flapping occurs on the interface. During the flapping, OSPFv3
frequently sends Hello packets to reestablish the neighbor relationship, synchronizes LSDBs,
and recalculates routes. In this process, a large number of packets are exchanged, adversely
affecting neighbor relationship stability, OSPFv3 services, and other OSPFv3-dependent
services, such as LDP and BGP. OSPFv3 neighbor relationship flapping suppression can
address this problem by delaying OSPFv3 neighbor relationship reestablishment or preventing
service traffic from passing through flapping links.
Related Concepts
Flapping_event: reported when the status of a neighbor relationship on an interface last
changes from Full to a non-Full state. The flapping_event triggers flapping detection.
Flapping_count: number of times flapping has occurred.
Detect-interval: detection interval. The interval is used to determine whether to trigger a
valid flapping_event.
Threshold: flapping suppression threshold. When the flapping_count reaches or exceeds
threshold, flapping suppression takes effect.
Resume-interval: interval for exiting from OSPFv3 neighbor relationship flapping
suppression. If the interval between two successive valid flapping_events is longer than
resume-interval, the flapping_count is reset.
Implementation
Flapping detection
Each OSPFv3 interface on which OSPFv3 neighbor relationship flapping suppression is
enabled starts a flapping counter. If the interval between two successive neighbor status
changes from Full to a non-Full state is shorter than detecting-interval, a valid
flapping_event is recorded, and the flapping_count increases by 1. When the flapping_count
reaches or exceeds threshold, flapping suppression takes effect. If the interval between two
successive neighbor status changes from Full to a non-Full state is longer than
resume-interval, the flapping_count is reset.
The detecting-interval, threshold, and resume-interval are configurable.
Flapping suppression
Flapping suppression works in either Hold-down or Hold-max-cost mode.
Hold-down mode: In the case of frequent flooding and topology changes during neighbor
relationship establishment, interfaces prevent neighbor relationships from being
reestablished during the suppression period, which minimizes LSDB synchronization
attempts and packet exchanges.
Hold-max-cost mode: If the traffic forwarding path changes frequently, interfaces use
65535 as the cost of the flapping link during Hold-max-cost suppression, which prevents
traffic from passing through the flapping link.
Flapping suppression can also work first in Hold-down mode and then in Hold-max-cost
mode.
By default, the Hold-max-cost mode takes effect. The mode and suppression duration can be
changed manually.
If an attack causes frequent neighbor relationship flapping, Hold-down mode can minimize
the impact of the attack.
When an interface enters the flapping suppression state, all neighbor relationships on the interface enter
the state accordingly.
Typical Scenarios
Basic scenario
In Figure 1-799, the traffic forwarding path is Device A -> Device B -> Device C -> Device E
before a link failure occurs. After the link between Device B and Device C fails, the
forwarding path switches to Device A -> Device B -> Device D -> Device E. If the neighbor
relationship between Device B and Device C frequently flaps at the early stage of the path
switchover, the forwarding path will be switched frequently, causing traffic loss and affecting
network stability. If the neighbor relationship flapping meets suppression conditions, flapping
suppression takes effect.
If flapping suppression works in Hold-down mode, the neighbor relationship between
Device B and Device C is prevented from being reestablished during the suppression
period, in which traffic is forwarded along the path Device A -> Device B -> Device D
-> Device E.
If flapping suppression works in Hold-max-cost mode, 65535 is used as the cost of the
link between Device B and Device C during the suppression period, and traffic is
forwarded along the path Device A -> Device B -> Device D -> Device E.
Broadcast scenario
In Figure 1-801, four devices are deployed on the same broadcast network using switches, and
the devices are broadcast network neighbors. If Device C flaps due to a link failure, and
Device A and Device B were deployed at different time (Device A was deployed earlier for
example) or the flapping suppression parameters on Device A and Device B are different,
Device A first detects the flapping and suppresses Device C. Consequently, the Hello packets
sent by Device A do not carry Device C's router ID. However, Device B has not detected the
flapping yet and still considers Device C a valid node. As a result, the DR candidates
identified by Device A are Device B and Device D, whereas the DR candidates identified by
Device B are Device A, Device C, and Device D. Different DR candidates result in a different
DR election result, which may lead to route calculation errors. To prevent this problem in
scenarios where an interface has multiple neighbors, such as on a broadcast, P2MP, or NBMA
network, all neighbors on the interface are suppressed when the status of a neighbor
relationship last changes to ExStart or Down. Specifically, if Device C flaps, Device A,
Device B, and Device D on the broadcast network are all suppressed. After the network
stabilizes and the suppression timer expires, Device A, Device B, and Device D are restored
to normal status.
Multi-area scenario
In Figure 1-802, Device A, Device B, Device C, Device E, and Device F are connected in area
1, and Device B, Device D, and Device E are connected in backbone area 0. Traffic from
Device A to Device F is preferentially forwarded along an intra-area route, and the forwarding
path is Device A -> Device B -> Device C -> Device E -> Device F. When the neighbor
relationship between Device B and Device C flaps and the flapping meets suppression
By default, the Hold-max-cost mode takes effect. The mode can be changed to Hold-down manually.
Background
If network-wide OSPFv3 LSAs are flushed, network stability will be adversely affected. In
this case, source tracing must be implemented to locate the root cause of the fault immediately
to minimize the impact. However, OSPFv3 itself does not support source tracing. A
conventional solution is isolation node by node until the faulty node is located. The solution is
complex and time-consuming. Therefore, a fast source tracing method is required. In this case,
OSPFv3 flush LSA source tracing is introduced, which allows maintenance personnel to
locate the faulty source on any device on the network.
Related Concepts
OSPFv3 flush LSA source tracing
A mechanism that helps locate the device that flushes LSAs. The feature has the following
characteristics:
Uses a new UDP port. Source tracing packets are carried by UDP packets, and the UDP
packets also carry the OSPFv3 LSAs flushed by the current device and are flooded hop
by hop based on the OSPFv3 topology.
Forwards packets along UDP channels which are independent of the channels used to
transmit OSPFv3 packets. Therefore, OSPFv3 flush LSA source tracing supports
incremental deployment. In addition, source tracing does not affect the devices with the
related UDP port disabled.
Supports query of the node that flushed LSAs on any of the devices after OSPFv3 flush
LSA source tracing packets are flooded on the network, which speeds up fault locating
and faulty node isolation.
Is Huawei proprietary.
Flush
Network-wide OSPFv3 LSAs are deleted.
PS-Hello packets
Packets used to negotiate the OSPFv3 flush LSA source tracing capability between OSPFv3
neighbors.
PS-LSA
When a device flushes an OSPFv3 LSA, it generates a PS-LSA carrying information about the
device and brief information about the OSPFv3 LSA.
PS-LSU packets
OSPFv3 flush LSA source tracing packets that carry PS-LSAs.
PS-LSU ACK packets
Acknowledgment packets used to improve OSPFv3 flush LSA source tracing packets.
OSPFv3 flush LSA source tracing port
ID of the UDP port that receives and sends OSPFv3 flush LSA source tracing packets. The
default port ID is 50133, which is configurable.
Implementation
The implementation of OSPFv3 flush LSA source tracing is as follows:
1. Source tracing capability negotiation
After an OSPFv3 neighbor relationship is established between two devices, they need to
negotiate the source tracing capability through PS-Hello packets.
2. PS-LSA generation and flooding
When a device flushes an OSPFv3 LSA, it generates a PS-LSA carrying information
about the device and brief information about the OSPFv3 LSA, adds the PS-LSA to a
PS-LSU packet, and floods the PS-LSU packet to source tracing-capable neighbors. The
PS-LSU packet is used to locate the faulty source.
When a device receives a PS-LSU packet from a neighbor, the neighbor records the
sequence number of the packet and replies with a PS-LSU ACK packet.
When the device receives a PS-LSU packet with the sequence number same as that in the
neighbor record, the device discards the PS-LSU packet.
After the devices parses a PS-LSU packet, it adds the PS-LSA in the packet to the LSDB.
The device also checks whether the PS-LSA is newer than the corresponding PS-LSA in
the LSDB.
− If the PS-LSA is newer, the device floods it to other neighbors.
− If the PS-LSA is the same as the corresponding PS-LSA in the LSDB, the device
does not process the received PS-LSA.
− If the PS-LSA is older, the device floods the corresponding PS-LSA in the LSDB to
the neighbor.
If the device receives a PS-LSU packet from a neighbor and the neighbor does not
support source tracing, the device modifies the neighbor status as source tracing capable.
Typical Scenarios
Scenario where all nodes support source tracing
All nodes on the network support source tracing, and node A is the faulty source. Figure 1-805
shows the networking.
When device A flushes an OSPFv3 LSA, it generates a PS-LSA that carries device A
information and brief information about the flush LSA. Then the PS-LSA is flooded on the
network hop by hop. After the fault occurs, maintenance personnel can log in to any node on
the network to locate device A that keeps sending flush LSAs and isolate device A from the
network.
Scenario where source tracing-incapable nodes are not isolated from source
tracing-capable nodes
All nodes on the network except device C support source tracing, and device A is the faulty
source. In this case, the PS-LSA can be flooded on the entire network. Figure 1-806 shows the
networking.
Figure 1-806 Scenario where source tracing-incapable nodes are not isolated from source
tracing-capable nodes
When device A flushes an OSPFv3 LSA, it generates a PS-LSA that carries device A
information and brief information about the flush LSA. Then the PS-LSA is flooded on the
network hop by hop. When devices B and E negotiate the source tracing capability with
device C, they find that device C does not support source tracing. Therefore, after device B
receives the PS-LSA from device A, device B sends the PS-LSA to device D, but not to device
C. After receiving the flush LSA from device C, device E generates a PS-LSA which carries
information about the advertisement source (device E), flush source (device C), and the flush
LSA, and floods the PS-LSA on the network.
After the fault occurs, maintenance personnel can log in to any device on the network except
device C to locate the faulty node. Two possible faulty nodes can be located in this case: node
A and node C, and they both sends the same flush LSA. In this case, device A takes
precedence over device C when the maintenance personnel determine the most possible faulty
source. After device A is isolated, the network recovers.
Scenario where source tracing-incapable nodes are isolated from source tracing-capable
nodes
All nodes on the network except devices C and D support source tracing, and node A is the
faulty source. In this case, the PS-LSA cannot be flooded on the entire network. Figure 1-807
shows the networking.
Figure 1-807 Scenario where source tracing-incapable nodes are isolated from source
tracing-capable nodes
When device A flushes an OSPFv3 LSA, it generates a PS-LSA that carries device A
information and brief information about the flush LSA. However, the PS-LSA can reach only
device B because devices C and D do not support source tracing.
During source tracing capability negotiation, device E finds that device C does not support
source tracing, and device F finds that device D does not support source tracing. After device
E receives the PS-LSA from device C, device E helps device E generate and flood a PS-LSA.
Similarly, after device F receives the PS-LSA from device D, device F helps device D
generate and flood a PS-LSA.
After the fault occurs:
If maintenance personnel log in to device A or B, the personnel can locate the faulty
source (device A) directly. After device A is isolated, the network recovers.
If the maintenance personnel log in to device E, F, G, or H, the personnel will find that
device E claims device C to be the faulty source and device F claims device D to be the
faulty source.
If the personnel log in to devices C and D and find that the flush LSA was sent by device
B, not generated by device C or D.
If the personnel log in to device B, determine that device A is the faulty device, and
isolate device A. After device A is isolated, the network recovers.
Hello packet
Database Description (DD) packet
Link State Request (LSR) packet
Link State Update (LSU) packet
Link State Acknowledgment (LSAck) packet
Hello Packet
Hello packets are commonly used packets, which are periodically sent on OSPFv3 interfaces
to establish and maintain neighbor relationships. A Hello packet includes information about
the designated router (DR), backup designated router (BDR), timers, and known neighbors.
Figure 1-809 shows the format of a Hello packet.
Table 1-227 lists the address types, interval types, and default intervals used when Hello
packets are transmitted on different networks.
To establish neighbor relationships between routers on the same network segment, you must set the
same HelloInterval, PollInterval, and RouterDeadInterval values for the routers. PollInterval applies
only to NBMA networks.
DD Packet
During an adjacency initialization, two routers use DD packets to describe their own link state
databases (LSDBs) for LSDB synchronization. A DD packet contains the header of each LSA
in an LSDB. An LSA header uniquely identifies an LSA. The LSA header occupies only a
small portion of the LSA, which reduces the amount of traffic transmitted between routers. A
neighbor can use the LSA header to check whether it already has the LSA. When two routers
exchange DD packets, one functions as the master and the other functions as the slave. The
master defines a start sequence number. The master increases the sequence number by one
each time it sends a DD packet. After the slave receives a DD packet, it uses the sequence
number carried in the DD packet for acknowledgement.
Figure 1-810 shows the format of a DD packet.
LSR Packet
After two routers exchange DD packets, they send LSR packets to request each other's LSAs.
The LSR packets contain the summaries of the requested LSAs. Figure 1-811 shows the
format of an LSR packet.
The LS type, Link State ID, and Advertising Router fields can uniquely identify an LSA. If two LSAs
have the same LS type, Link State ID, and Advertising Router fields, a router uses the LS sequence
number, LS checksum, and LS age fields to obtain a required LSA.
LSU Packet
A router uses an LSU packet to transmit LSAs requested by its neighbors or to flood its own
updated LSAs. The LSU packet contains a set of LSAs. For multicast and broadcast networks,
LSU packets are multicast to flood LSAs. To ensure reliable LSA flooding, a router uses an
LSAck packet to acknowledge the LSAs contained in an LSU packet that is received from a
neighbor. If an LSA fails to be acknowledged, the router retransmits the LSA to the neighbor.
Figure 1-812 shows the format of an LSU packet.
LSAck Packet
A router uses an LSAck packet to acknowledge the LSAs contained in a received LSU packet.
The LSAs can be acknowledged using LSA headers. LSAck packets can be transmitted over
different links in unicast or multicast mode. Figure 1-813 shows the format of an LSAck
packet.
Router-LSA
A router-LSA describes the link status and cost of a router. Router-LSAs are generated by a
router and advertised within the area to which the router belongs. Figure 1-815 shows the
format of a router-LSA.
Network-LSA
A network-LSA describes the link status of all routers on the local network segment.
Network-LSAs are generated by a DR on a broadcast or non-broadcast multiple access
(NBMA) network and advertised within the area to which the DR belongs. Figure 1-816
shows the format of a network-LSA.
Inter-Area-Prefix-LSA
An inter-area-prefix-LSA describes routes on a network segment in an area, It is generated by
the ABR. The routes are advertised to other areas.
Figure 1-817 shows the format of an inter-area-prefix-LSA.
Inter-Area-Router-LSA
An inter-area-router-LSA describes routes to ASBR in other areas, It is generated by the ABR.
The routes are advertised to all related areas except the area that the ASBR belongs to.
Figure 1-818 shows the format of an inter-area-prefix-LSA.
AS-External-LSAs
An as-external-LSA describes destinations outside the AS, it is originated by ASBR.
Figure 1-819 shows the format of an as-external-LSA.
Link-LSAs
Each router generates a link LSA for each link. A link LSA describes the link-local address
and IPv6 address prefix associated with the link and the link option set in the network LSA. It
is transmitted only on the link.
Figure 1-820 shows the format of a Link-LSA.
Intra-Area-Prefix-LSAs
Each router or DR generates one or more intra-area prefix LSAs and transmits it in the local
area.
An LSA generated on a router describes the IPv6 address prefix associated with the
router LSA.
An LSA generated on a DR describes the IPv6 address prefix associated with the
network LSA.
Figure 1-821 shows the format of an intra-area-prefix-LSA.
1.10.8 IS-IS
1.10.8.1 Introduction
Definition
Intermediate System to Intermediate System (IS-IS) is a dynamic routing protocol initially
designed by the International Organization for Standardization (ISO) for its Connectionless
Network Protocol (CLNP).
To support IP routing, the Internet Engineering Task Force (IETF) extends and modifies IS-IS
in relevant standards, which enables IS-IS to be applied to both TCP/IP and Open System
Interconnection (OSI) environments. This type of IS-IS is called Integrated IS-IS or Dual
IS-IS.
In this document, IS-IS refers to Integrated IS-IS, unless otherwise stated.
If IS-IS IPv4 and IS-IS IPv6 implement a feature in the same way, details are not provided in this
chapter. For details about the implementation differences, see the 1.10.8.4 Appendixes.
Purpose
As an Interior Gateway Protocol (IGP), IS-IS is used in Autonomous Systems (ASs). IS-IS is
a link state protocol, and it uses the Shortest Path First (SPF) algorithm to calculate routes.
1.10.8.2 Principles
1.10.8.2.1 Basic Concepts of IS-IS
IS-IS Areas
To support large-scale routing networks, IS-IS adopts a two-level structure in a routing
domain. A large domain is divided into areas. Figure 1-822 shows an IS-IS network. The
entire backbone area covers all Level-2 devices in area 1 and Level-1-2 devices in other areas.
Three types of devices on the IS-IS network are described as follows:
Level-1 device
A Level-1 device manages intra-area routing. It establishes neighbor relationships with
only the Level-1 and Level-1-2 devices in the same area and maintains a Level-1 LSDB.
The LSDB contains routing information in the local area. A packet to a destination
beyond this area is forwarded to the nearest Level-1-2 device.
Level-2 device
A Level-2 device manages inter-area routing. It can establish neighbor relationships with
all Level-2 devices and Level-1-2 devices, and maintains a Level-2 LSDB which
contains inter-area routing information.
All Level-2 devices form the backbone network of the routing domain. Level-2 neighbor
relationships are set up between them. They are responsible for communications between
areas. The Level-2 devices in the routing domain must be in succession to ensure the
continuity of the backbone network. Only Level-2 devices can exchange data packets or
routing information with the devices beyond the routing domain.
Level-1-2 device
A device, which can establish neighbor relationships with both Level-1 devices and
Level-2 devices, is called a Level-1-2 device. A Level-1-2 device can establish Level-1
neighbor relationships with Level-1 devices and Level-1-2 devices in the same area. It
can also establish Level-2 neighbor relationships with Level-2 devices and Level-1-2
devices in other areas. Level-1 devices can be connected to other areas only through
Level-1-2 devices.
A Level-1-2 device maintains two LSDBs: a Level-1 LSDB and a Level-2 LSDB. The
Level-1 LSDB is used for intra-area routing, while the Level-2 LSDB is used for
inter-area routing.
Level-1 devices in different areas cannot establish neighbor relationships. Level-2 devices can establish
neighbor relationships with each other, regardless of the areas to which the Level-2 devices belong.
In general, Level-1 devices are located within an area, Level-2 devices are located between
areas, and Level-1-2 devices are located between Level-1 devices and Level-2 devices.
Interface level
A Level-1-2 device may need to establish only a Level-1 adjacency with one neighbor and
establish only a Level-2 adjacency with another neighbor. In this case, you can set the level of
an interface to control the setting of adjacencies on the interface. Specifically, only Level-1
adjacencies can be established on a Level-1 interface, and only Level-2 adjacencies can be
established on a Level-2 interface.
Area address
An IDP and HODSP of the DSP can identify a routing domain and the areas in a routing
domain; therefore, the combination of the IDP and HODSP is referred to as an area
address, equal to an area ID in OSPF. An area address is used to uniquely identify an area
in a routing domain. The area addresses of routers in the same Level-1 area must be the
same, while the area addresses of routers in the Level-2 area can be different.
In general, a router can be configured with only one area address. The area address of all
nodes in an area must be the same. In the implementation of a device, an IS-IS process
can be configured with a maximum of three area addresses to support seamless
combination, division, and transformation of areas.
System ID
A system ID uniquely identifies a host or a router in an area. In the device, the length of
the system ID is 48 bits (6 bytes).
A router ID corresponds to a system ID. If a device uses the IP address of Loopback 0
(168.10.1.1) as its router ID, its system ID used in IS-IS can be obtained by performing
the following steps:
− Extend each part of the IP address 168.10.1.1 to 3 digits and add 0 or 0s to the front
of the part that is shorter than 3 digits.
− Divide the extended address 168.010.001.001 into three parts, with each part
consisting of 4 decimal digits.
− The reconstructed 1680.1000.1001 is the system ID.
There are many ways to specify a system ID. Whichever you choose, ensure that the
system ID uniquely identifies a host or a device.
If the same system ID is configured for more than one device on the same network, network flapping
may occur. To address this problem, IS-IS provides the automatic recovery function. With the function,
if the system detects an IS-IS system ID conflict, it automatically changes the local system ID to resolve
the conflict. The first two bytes of the system ID automatically changed by the system are Fs, and the
last four bytes are randomly generated. For example, FFFF:1234:5678 is such a system ID. If the
conflict persists after the system automatically changes three system IDs, the system no longer resolves
this conflict.
SEL
The role of an SEL (also referred to as NSAP Selector or N-SEL) is similar to that of the
"protocol identifier" of IP. A transport protocol matches an SEL. The SEL is "00" in IP.
NET
A Network Entity Title (NET) indicates the network layer information of an IS itself and
consists of an area ID and a system ID. It does not contain the transport layer
information (SEL = 0). A NET can be regarded as a special NSAP. The length of the
NET field is the same as that of an NSAP, varying from 8 bytes to 20 bytes. For example,
in NET ab.cdef.1234.5678.9abc.00, the area is ab.cdef, the system ID is
1234.5678.9abc, and the SEL is 00.
In general, an IS-IS process is configured with only one NET. When areas need to be
redefined, for example, areas need to be combined or an area needs to be divided into
sub-areas, you can configure multiple NETs.
A maximum of three area addresses can be configured in an IS-IS process, and therefore, you can
configure only a maximum of three NETs. When you configure multiple NETs, ensure that their system
IDs are the same.
Related Concepts
DIS and Pseudo Node
A Designated Intermediate System (DIS) is an intermediate router elected in IS-IS
communication. A pseudo node simulates a virtual node on a broadcast network and is not a
real router. In IS-IS, a pseudo node is identified by the system ID and 1-byte circuit ID (a
non-zero value) of a DIS.
The DIS is used to create and update pseudo nodes and generate the link state protocol data
units (LSPs) of pseudo nodes. The routers advertise a single link to a pseudo node and obtain
routing information about the entire network through the pseudo node. The router does not
need to exchange packets with all the other routers on the network. Using the DIS and pseudo
nodes simplifies network topology and reduces the length of LSPs generated by routers.
When the network changes, fewer LSPs are generated. Therefore, fewer resources are
consumed.
SPF Algorithm
The SPF algorithm, also named Dijkstra's algorithm, is used in a link-state routing protocol to
calculate the shortest paths to other nodes on a network. In the SPF algorithm, a local router
takes itself as the root and generates a shortest path tree (SPT) based on the network topology
to calculate the shortest path to every destination node on a network. In IS-IS, the SPF
algorithm runs separately in Level-1 and Level-2 databases.
Implementation
All routers on the IS-IS network communicate through the following steps:
Establishment of IS-IS Neighbor Relationships
LSDB Synchronization
Route Calculation
Establishment of IS-IS Neighbor Relationships
On different types of networks, the modes for establishing IS-IS neighbor relationships are
different.
Establishment of a neighbor relationship on a broadcast link
− Device A sends a Level-2 LAN IIH to Device B. After Device B receives the IIH,
Device B detects that the neighbor field in the IIH contains its MAC address, and
sets its neighbor status with Device A to Up.
DIS Election
On a broadcast network, any two routers exchange information. If n routers are available
on the network, n x (n - 1)/2 adjacencies must be established. Each status change of a
router is transmitted to other routers, which wastes bandwidth resources. IS-IS resolves
this problem by introducing the DIS. All routers send information to the DIS, which then
broadcasts the network link status. Using the DIS and pseudo nodes simplifies network
topology and reduces the length of LSPs generated by routers. When the network
changes, fewer LSPs are generated. Therefore, fewer resources are consumed.
A DIS is elected after a neighbor relationship is established. Level-1 and Level-2 DISs
are elected separately. You can configure different priorities for DISs at different levels.
In DIS election, a Level-1 priority and a Level-2 priority are specified for every interface
on every router. A router uses every interface to send IIHs and advertises its priorities in
the IIHs to neighboring routers. The higher the priority, the higher the probability of
being elected as the DIS. If there are multiple routers with the same highest priority on a
broadcast network, the one with the largest MAC address is elected. The DISs at
different levels can be the same router or different routers.
In the DIS election procedure, IS-IS is different from Open Shortest Path First (OSPF).
In IS-IS, DIS election rules are as follows:
− The router with the priority of 0 also takes part in the DIS election.
− When a new router that meets the requirements of being a DIS is added to the
broadcast network, the router is selected as the new DIS, which triggers a new
round of LSP flooding.
Establishment of a neighbor relationship on a P2P link
The establishment of a neighbor relationship on a P2P link is different from that on a
broadcast link. A neighbor relationship on a P2P link can be established in 2-way or
3-way mode, as shown in Table 1-240. By default, the 3-way handshake mechanism is
used to establish a neighbor relationship on a P2P link.
LSDB Synchronization
IS-IS is a link-state protocol. An IS-IS router obtains first-hand information from other routers
running link-state protocols. Every router generates information about itself, directly
connected networks, and links between itself and directly connected networks. The router
then sends the generated information to other routers through adjacent routers. Every router
saves link state information without modifying it. Finally, every router has the same network
interworking information, and LSDB synchronization is complete. The process of
synchronizing LSDBs is called LSP flooding. In LSP flooding, a router sends an LSP to its
neighbors and the neighbors send the received LSP to their neighbors except the router that
first sends the LSP. The LSP is flooded among the routers at the same level. This
implementation allows each router at the same level to have the same LSP information and
keep a synchronized LSDB.
All routers in the IS-IS routing domain can generate LSPs. A new LSP is generated in any of
the following situations:
Neighbor goes Up or Down.
related interface goes Up or Down.
Route Calculation
When LSDB synchronization is complete and network convergence is implemented, IS-IS
performs SPF calculation by using LSDB information to obtain the SPT. IS-IS uses the SPT to
create a forwarding database (a routing table).
In IS-IS, link costs are used to calculate shortest paths. The default cost for an interface on a
Huawei router is 10. The cost is configurable. The cost of a route is the sum of the cost of
every outbound interface along the route. There may be multiple routes to a destination,
among which the route with the smallest cost is the optimal route.
Level-1 routers can also calculate the shortest path to Level-2 routers to implement inter-area
route selection. When a Level-1-2 router is connected to other areas, the router sets the value
of the attachment (ATT) bit in its LSP to 1 and sends the LSP to neighboring routers. In the
route calculation process, a Level-1 router selects the nearest Level-1-2 router as an
intermediate router between the Level-1 and Level-2 areas.
Route Leaking
When Level-1 and Level-2 areas both exist on an IS-IS network, Level-2 routers do not
advertise the learned routing information about a Level-1 area and the backbone area to any
other Level-1 area by default. Therefore, Level-1 routers do not know the routing information
beyond the local area. As a result, the Level-1 routers cannot select the optimal routes to the
destination beyond the local area.
With route leaking, Level-1-2 routers can select routes using routing policies, or tags and
advertise the selected routes of other Level-1 areas and the backbone area to the Level-1 area.
Figure 1-827 shows the typical networking for route leaking.
Device A, Device B, Device C, and Device D belong to area 10. Device A and Device B
are Level-1 routers. Device C and Device D are Level-1-2 routers.
Device E and Device F belong to area 20 and are Level-2 routers.
If Device A sends a packet to Device F, the selected optimal route should be Device A ->
Device B -> Device D -> Device E -> Device F because its cost is 40 (10 + 10 + 10 + 10 = 40)
which is less than that of Device A -> Device C -> Device E -> Device F (10 + 50 + 10 = 70).
However, if you check routes on Device A, you can find that the selected route is Device A ->
Device C -> Device E -> Device F, which is not the optimal route from Device A to Device F.
This is because Device A does not know the routes beyond the local area, and therefore, the
packets sent by Device A to other network segments are sent through the default route
generated by the nearest Level-1-2 device.
In this case, you can enable route leaking on the Level-1-2 devices (Device C and Device D).
Then, check the route and you can find that the selected route is Device A -> Device B ->
Device D -> Device E -> Device F.
Route Summarization
On a large-scale IS-IS network, links connected to devices within an IP address range may
alternate between Up and Down. With route summarization, multiple routes with the same IP
prefix are summarized into one route, which prevents route flapping, reduces routing entries
and system resource consumption, and facilitates route management. Figure 1-828 shows the
typical networking for route summarization.
router A, router B, and router C use IS-IS to communicate with each other.
Device A belongs to area 20, and Device B and Device C belong to area 10.
Device A is a Level-2 router. Device B is a Level-1-2 router. Device C is a Level-1
router.
Device B maintains Level-1 and Level-2 LSDBs and leaks the routes to three network
segments (172.1.1.0/24, 172.1.2.0/24, and 172.1.3.0/24) from the Level-1 area to the
Level-2 area. If a link fault causes the Device C interface with IP address 172.1.1.1/24 to
frequently alternate between Up and Down, the status change is advertised to the Level-2
area, triggering frequent LSP flooding and SPF calculation on Device A. As a result, the
CPU usage on Device A increases, and even network flapping occurs.
On Device B, routes to the three network segments in the Level-1 area are summarized
to one route to 172.1.0.0/16, which reduces the number of routing entries on Device B
and minimizes the impact of route flapping in the Level-1 area on route convergence in
the Level-2 area.
Load Balancing
If multiple equal-cost routes are available on a network, they can load-balance traffic, which
improves link usage and prevents network congestion caused by link overload. Figure 1-829
shows the typical networking for load balancing.
Administrative Tag
Administrative tags carry administrative information about IP address prefixes. When the cost
type is wide, wide-compatible, or compatible and the prefix of the reachable IP address to be
advertised by IS-IS has this cost type, IS-IS adds the administrative tag to the reachability
type-length-value (TLV) in the prefix. In this manner, the administrative tag is advertised
throughout the entire IS-IS area so that routes can be imported or filtered based on the
administrative tag.
synchronization of the LSDBs in the entire network segment using the CSNP and PSNP
mechanisms.
Link Group
In Figure 1-830, router A is dual-homed to the IS-IS network through router B and router C.
The path router A -> router B is primary and the path router A -> router C is backup. The
bandwidth of each link is 100 Gbit/s, and the traffic from Client is transmitted at 150 Gbit/s.
In this situation, both links in the path router A -> router B or the path router A -> router C
need to carry the traffic. If Link-a fails, Link-b takes over the traffic. However, traffic loss
occurs because the bandwidth of Link-b is not sufficient enough to carry the traffic.
To address this problem, configure link groups. You can add multiple links to a link group. If
one of the links fails and the bandwidth of all the links in the group is not sufficient enough to
carry the traffic, the link group automatically increases the costs of the other links to a
configured value so that this link group is not selected. Then, traffic is switched to another
link group.
In Figure 1-830, Link-a and Link-b belong to link group 1, and Link-c and Link-d belong to
link group 2.
If Link-a fails, link group 1 automatically increases the cost of Link-b so that the traffic
is switched to link group 2.
If both Link-a and Link-c fail, the link groups increase the costs of Link-b and Link-d (to
the same value) so that Link-b and Link-d load-balance the traffic.
Background
If the status of an interface carrying IS-IS services alternates between Up and Down, IS-IS
neighbor relationship flapping occurs on the interface. During the flapping, IS-IS frequently
sends Hello packets to reestablish the neighbor relationship, synchronizes LSDBs, and
recalculates routes. In this process, a large number of packets are exchanged, adversely
affecting neighbor relationship stability, IS-IS services, and other IS-IS-dependent services,
such as LDP and BGP. IS-IS neighbor relationship flapping suppression can address this
problem by delaying IS-IS neighbor relationship reestablishment or preventing service traffic
from passing through flapping links.
Related Concepts
Flapping_event: reported when the status of a neighbor relationship on an interface last
changes from Up to Init or Down. The flapping_event triggers flapping detection.
Flapping_count: number of times flapping has occurred.
Detect-interval: interval at which flapping is detected. The interval is used to determine
whether to trigger a valid flapping_event.
Threshold: flapping suppression threshold. When the flapping_count exceeds the threshold,
flapping suppression takes effect.
Resume-interval: interval used to determine whether flapping suppression exits. If the
interval between two valid flapping_events is longer than the resume-interval, flapping
suppression exits.
Implementation
Flapping detection
IS-IS interfaces start a flapping counter. If the interval between two flapping_events is shorter
than the detect-interval, a valid flapping_event is recorded, and the flapping_count increases
by 1. When the flapping_count exceeds the threshold, the system determines that flapping
occurs, and therefore triggers flapping suppression, and sets the flapping_count to 0. If the
interval between two valid flapping_events is longer than the resume-interval before the
flapping_count reaches the threshold again, the system sets the flapping_count to 0 again.
Interfaces start the suppression timer when the status of a neighbor relationship last changes
to ExStart or Down.
The detect-interval, threshold, and resume-interval are configurable.
Flapping suppression
Flapping suppression works in either Hold-down or Hold-max-cost mode.
Hold-down mode: In the case of frequent flooding and topology changes during neighbor
relationship establishment, interfaces prevent neighbor relationships from being
reestablished during the suppression period, which minimizes LSDB synchronization
attempts and packet exchanges.
Hold-max-cost mode: If the traffic forwarding path changes frequently, interfaces use the
maximum cost of the flapping link during the suppression period, which prevents traffic
from passing through the flapping link.
Flapping suppression can also work first in Hold-down mode and then in Hold-max-cost
mode.
By default, the Hold-max-cost mode takes effect. The mode and suppression period can be
changed manually.
When an interface enters the flapping suppression state, all neighbor relationships on the interface enter
the state accordingly.
Basic scenario
In Figure 1-831, the traffic forwarding path is Device A -> Device B -> Device C -> Device E
before a link failure occurs. After the link between Device B and Device C fails, the
forwarding path switches to Device A -> Device B -> Device D -> Device E. If the neighbor
relationship between Device B and Device C frequently flaps at the early stage of the path
switchover, the forwarding path will be switched frequently, causing traffic loss and affecting
network stability. If the neighbor relationship flapping meets suppression conditions, flapping
suppression takes effect.
If flapping suppression works in Hold-down mode, the neighbor relationship between
Device B and Device C is prevented from being reestablished during the suppression
period, in which traffic is forwarded along the path Device A -> Device B -> Device D
-> Device E.
If flapping suppression works in Hold-max-cost mode, the maximum cost is used as the
cost of the link between Device B and Device C during the suppression period, and
traffic is forwarded along the path Device A -> Device B -> Device D -> Device E.
When only one forwarding path exists on the network, the flapping of the neighbor
relationship between any two devices on the path will interrupt traffic forwarding. In Figure
1-832, the traffic forwarding path is Device A -> Device B -> Device C -> Device E. If the
neighbor relationship between Device B and Device C flaps, and the flapping meets
suppression conditions, flapping suppression takes effect. However, if the neighbor
relationship between Device B and Device C is prevented from being reestablished, the whole
network will be divided. Therefore, Hold-max-cost mode (rather than Hold-down mode) is
recommended. If flapping suppression works in Hold-max-cost mode, the maximum cost is
used as the cost of the link between Device B and Device C during the suppression period.
After the network stabilizes and the suppression timer expires, the link is restored.
Broadcast scenario
In Figure 1-833, four devices are deployed on the same broadcast network using switches, and
the devices are broadcast network neighbors. If Device C flaps due to a link failure, and
Device A and Device B were deployed at different time (Device A was deployed earlier for
example) or the flapping suppression parameters on Device A and Device B are different,
Device A first detects the flapping and suppresses Device C. Consequently, the Hello packets
sent by Device A do not carry Device C's router ID. However, Device B has not detected the
flapping yet and still considers Device C a valid node. As a result, the DR candidates
identified by Device A are Device B and Device D, whereas the DR candidates identified by
Device B are Device A, Device C, and Device D. Different DR candidates result in a different
DR election result, which may lead to route calculation errors. To prevent this problem in
scenarios where an interface has multiple neighbors, such as on a broadcast, P2MP, or NBMA
network, all neighbors on the interface are suppressed when the status of a neighbor
relationship last changes to ExStart or Down. Specifically, if Device C flaps, Device A,
Device B, and Device D on the broadcast network are all suppressed. After the network
stabilizes and the suppression timer expires, Device A, Device B, and Device D are restored
to normal status.
By default, the Hold-max-cost mode takes effect. The mode can be changed to Hold-down manually.
Scenario with both LDP-IGP synchronization and IS-IS neighbor relationship flapping
suppression configured
In Figure 1-835, if the link between PE1 and P1 fails, an LDP LSP switchover is implemented
immediately, causing the original LDP LSP to be deleted before a new LDP LSP is established.
To prevent traffic loss, LDP-IGP synchronization needs to be configured. With LDP-IGP
synchronization, the maximum cost is used as the cost of the new LSP to be established. After
the new LSP is established, the original cost takes effect. Consequently, the original LSP is
deleted, and LDP traffic is forwarded along the new LSP.
LDP-IGP synchronization and IS-IS neighbor relationship flapping suppression work in either
Hold-down or Hold-max-cost mode. If both functions are configured, Hold-down mode takes
precedence over Hold-max-cost mode, followed by the configured link cost. Table 1-241 lists
the suppression modes that take effect in different situations.
Table 1-241 Principles for selecting the suppression modes that take effect in different situations
LDP-IGP LDP-IGP LDP-IGP Exited from
Synchronization/IS-I Synchronization Synchronization LDP-IGP
S Neighbor Hold-down Mode Hold-max-cost Synchronization
Relationship Mode Suppression
Flapping
Suppression Mode
IS-IS Neighbor Hold-down Hold-down Hold-down
Relationship
Flapping
Suppression
Hold-down Mode
IS-IS Neighbor Hold-down Hold-max-cost Hold-max-cost
Relationship
Flapping
Suppression
Hold-max-cost
Mode
Exited from IS-IS Hold-down Hold-max-cost Exited from
Neighbor LDP-IGP
Relationship synchronization and
Flapping IS-IS neighbor
Suppression relationship flapping
suppression
For example, the link between PE1 and P1 frequently flaps in Figure 1-835, and both
LDP-IGP synchronization and IS-IS neighbor relationship flapping suppression are
configured. In this case, the suppression mode is selected based on the preceding principles.
No matter which mode (Hold-down or Hold-max-cost) is selected, the forwarding path is PE1
-> P4 -> P3 -> PE2.
For example, the link between PE1 and P1 frequently flaps in Figure 1-835, and both
LDP-IGP synchronization and IS-IS neighbor relationship flapping suppression are
configured. In this case, the suppression mode is selected based on the preceding principles.
No matter which mode (Hold-down or Hold-max-cost) is selected, the forwarding path is PE1
-> P4 -> P3 -> PE2.
Figure 1-835 Scenario with both LDP-IGP synchronization and IS-IS neighbor relationship
flapping suppression configured
Figure 1-836 Scenario with both Link-bundle and IS-IS neighbor relationship flapping
suppression configured
Partial route calculation (PRC) calculates only those routes which have changed when
the network topology changes.
Link State PDUs (LSP) fast flooding
LSP fast flooding speeds up LSP flooding.
Intelligent timer
The first timeout period of the timer is fixed. If an event that triggers the timer occurs
before the set timer expires, the next timeout period of the timer increases.
The intelligent timer applies to LSP generation and SPF calculation.
I-SPF
In ISO 10589, the Dijkstra algorithm was adopted to calculate routes. When a node changes
on the network, the algorithm recalculates all routes. The calculation requires a long time to
complete and consumes a significant amount of CPU resources, reducing convergence speed.
I-SPF improves the algorithm. Except for the first time the algorithm is run, only the nodes
that have changed rather than all nodes in the network are used in the calculation. The SPT
generated using I-SPF is the same as that generated using the previous algorithm. This
significantly decreases CPU usage and speeds up network convergence.
PRC
Similar to I-SPF, PRC calculates only routes that have changed. PRC, however, does not
calculate the shortest path. It updates routes based on the SPT calculated by I-SPF.
In route calculation, a leaf represents a route, and a node represents a device. If the SPT
changes after I-SPF calculation, PRC calculates all the leaves only on the changed node. If the
SPT remains unchanged, PRC calculates only the changed leaves.
For example, if IS-IS is enabled on an interface of a node, the SPT calculated by I-SPF
remains unchanged. In this case, PRC updates only the routes of this interface, which
consumes less CPU resources.
PRC working with I-SPF further improves network convergence performance and replaces
the original SPF algorithm.
On the NE20E, only I-SPF and PRC are used to calculate IS-IS routes.
Intelligent Timer
Although the route calculation algorithm is improved, the long interval for triggering route
calculation also affects the convergence speed. A millisecond-level timer can shorten the
interval. Frequent network changes, however, also consume too much CPU resources. The
SPF intelligent timer addresses these problems.
In most cases, an IS-IS network running normally is stable. The frequent changes on a
network are rather rare, and IS-IS does not calculate routes frequently. Therefore, a short
period (within milliseconds) can be configured as the first interval for route calculation. If the
network topology changes frequently, the interval set by the intelligent timer increases with
the calculation times to reduce CPU consumption.
The LSP generation intelligent timer is similar to the SPF intelligent timer. When the LSP
generation intelligent timer expires, the system generates a new LSP based on the current
topology. The original mechanism uses a timer with fixed intervals, which results in slow
convergence and high CPU consumption. Therefore, the LSP generation timer is designed as
an intelligent timer to respond to emergencies (for example, the interface goes Up or Down)
quickly and speed up network convergence. In addition, when the network changes frequently,
the interval for the intelligent timer becomes longer to reduce CPU consumption.
Terms
Originating system
The originating system is a device that runs the IS-IS protocol. A single IS-IS process
advertises LSPs as virtual devices do, except that the originating system refers to a real
IS-IS process.
Normal system ID
The normal system ID is the system ID of the originating system.
Additional system ID
The additional system ID, assigned by the network administrator, is used to generate
additional or extended LSP fragments. A maximum of 256 additional or extended LSP
fragments can be generated. Like a normal system ID, an additional system ID must be
unique in a routing domain.
Virtual system
The virtual system, identified by an additional system ID, is used to generate extended
LSP fragments. These fragments carry additional system IDs in their LSP IDs.
Principles
IS-IS LSP fragments are identified by the LSP Number field in their LSP IDs. The LSP
Number field is 1 byte. Therefore, an IS-IS process can generate a maximum of 256
fragments. With fragment extension, more information can be carried.
Each system ID represents a virtual system, and each virtual system can generate 256 LSP
fragments. In addition, another virtual systems can be configured. Therefore, an IS-IS process
can generate more LSP fragments.
After a virtual system and fragment extension are configured, an IS-IS device adds the
contents that cannot be contained in its LSPs to the LSPs of the virtual system and notifies
other devices of the relationship between the virtual system and itself through a special TLV
in the LSPs.
IS Alias ID TLV
Standard protocol defines a special Type-Length-Value (TLV): IS Alias ID TLV.
LSPs with fragment number 0 sent by the originating system and virtual system carry IS Alias
ID TLVs to indicate the originating system.
Operation Modes
IS-IS devices can use the LSP fragment extension feature in the following modes:
Mode-1
Mode-1 is used when some devices on the network do not support LSP fragment
extension.
In this mode, virtual systems participate in SPF calculation. The originating system
advertises LSPs containing information about links to each virtual system and each
virtual system advertises LSPs containing information about links to the originating
system. In this manner, the virtual systems function the same as the actual devices
connected to the originating system on the network.
Mode-1 is a transitional mode for earlier versions that do not support LSP fragment
extension. In the earlier versions, IS-IS cannot identify Alias ID TLVs. Therefore, the
LSP sent by a virtual system must look like a common IS-IS LSP.
The LSP sent by a virtual system contains the same area address and overload bit as
those in the common LSP. If the LSPs sent by a virtual system contain TLVs specified in
other features, the TLVs must be the same as those in common LSPs.
LSPs sent by a virtual system carry information of the neighbor (the originating system),
and the carried cost is the maximum value minus 1. LSPs sent by the originating system
carry information of the neighbor (the virtual system), and the carried cost is 0. This
mechanism ensures that the virtual system is a node downstream of the originating
system when other devices calculate routes.
In Figure 1-837, Device B does not support LSP fragment extension; Device A supports
LSP fragment extension in mode-1; Device A1 and Device A2 are virtual systems of
Device A. Device A1 and Device A2 send LSPs carrying partial routing information of
Device A. After receiving LSPs from Device A, Device A1, and Device A2, Device B
considers there to be three devices at the peer end and calculates routes normally.
Because the cost of the route from Device A to Device A1 or Device A2 is 0, the cost of
the route from Device B to Device A is equal to that from Device B to Device A1.
Mode-2
Mode-2 is used when all the devices on the network support LSP fragment extension. In
this mode, virtual systems do not participate in SPF calculation. All the devices on the
network know that the LSPs generated by the virtual systems actually belong to the
originating system.
IS-IS working in mode-2 identifies IS Alias ID TLVs, which are used to calculate the
SPT and routes.
In Figure 1-837, Device B supports LSP fragment extension, and Device A supports LSP
fragment extension in mode-2; Device A1 and Device A2 send LSPs carrying some
routing information of Device A. After receiving LSPs from Device A1 and Device A2,
Device B obtains IS Alias ID TLV and learns that the originating system of Device A1
and Device A2 is Device A. Device B then considers information advertised by Device
A1 and Device A2 to be about Device A.
Whatever the LSP fragment extension mode, LSPs can be resolved. However, if LSP
fragment extension is not supported, only LSPs in mode-1 can be resolved.
Process
After LSP fragment extension is configured, if information is lost because LSPs overflow, the
system restarts the IS-IS process. After being restarted, the originating system loads as much
routing information as possible. Any excessive information beyond the forwarding capability
of the system is added to the LSPs of the virtual systems for transmission. In addition, if a
virtual system with routing information is deleted, the system automatically restarts the IS-IS
process.
Usage Scenario
If there are non-Huawei devices on the network, LSP fragment extension must be set to mode-1.
Otherwise, these devices cannot identify LSPs.
Configuring LSP fragment extension and virtual systems before setting up IS-IS neighbors or
importing routes is recommended. If IS-IS neighbors are set up or routes are imported first
and the information to be carried exceeds the forwarding capability of 256 fragments before
LSP fragment extension and virtual systems are configured, you have to restart the IS-IS
process for the configurations to take effect.
In addition, the 3-way handshake mechanism uses the 32-bit Extended Local Circuit ID field,
which extends the original 8-bit Extended Local Circuit ID field and the limit of only 255 P2P
links.
1.10.8.2.10 IS-IS TE
IS-IS TE supports MPLS establishment and maintenance of Constraint-based Routed Label
Switched Paths (CR-LSPs).
To establish CR-LSPs, MPLS needs to learn the traffic attributes of all the links in the local
area. MPLS can acquire the TE information of the links through IS-IS.
Traditional routers select the shortest path as the primary route regardless of other factors,
such as bandwidth, even when the path is congested.
On the network shown in Figure 1-838, all the links have the same cost (10). The shortest path
from Device A/Device H to Device E is Device A/Device H → Device B → Device C →
Device D → Device E. Data is forwarded along this shortest path. Therefore, the path Device
A (Device H) → Device B → Device C → Device D → Device E may be congested while the
path Device A/Device H → Device B → Device F → Device G → Device D → Device E is
idle.
To resolve the preceding problem, the cost of the path from Device B to Device C can be set
to 30 so that the traffic is switched to the path Device A/Device H → Device B → Device F
→ Device G → Device D → Device E.
This method eliminates the congestion on the link Device A/Device H → Device B → Device
C → Device D → Device E; however, the other link Device A (Device H) → Device B →
Device F → Device G → Device D → Device E may be congested. In addition, on networks
with complicated topologies, changing the cost of one link may affect multiple routes.
As an overlay model, MPLS can set up a virtual topology over the physical network topology
and map traffic to the virtual topology, effectively combining MPLS and TE technology into
MPLS TE.
MPLS TE can resolve network congestion problems by allowing carriers can precisely control
the path through which traffic passes and prevent traffic from passing through congested
nodes. Meanwhile, MPLS TE can reserve resources during the establishment of LSPs to
ensure service quality.
To ensure continuity of services, MPLS TE provides the CR-LSP backup and fast reroute
(FRR) mechanisms. If a link fault occurs, traffic can be switched immediately. Through
MPLS TE, service providers (SPs) can fully utilize the current network resources to provide
diverse services, optimize network resources, and methodically manage the network.
To accomplish the preceding tasks, MPLS TE needs to learn TE information about all devices
on the network. However, MPLS TE lacks a mechanism in which each device floods its TE
information throughout the entire network for TE information synchronization. However,
IS-IS does provide such a mechanism. Therefore, MPLS TE can advertise and synchronize TE
information with the help of IS-IS. To support MPLS TE, IS-IS needs to be extended.
In brief, IS-IS TE collects TE information on IS-IS networks and then transmits the TE
information to the Constrained Shortest Path First (CSPF) module.
Basic Principles
IS-IS TE is an extension of IS-IS intended to support MPLS TE. As defined in standard
protocols IS-IS TE defines new TLVs in IS-IS LSPs to carry TE information to help MPLS
implement the flooding, synchronization, and resolution of TE information. Then, IS-IS TE
transmits the resolved TE information to the CSPF module. In MPLS TE, IS-IS TE plays the
role of a porter. Figure 1-839 illustrates the relationships between IS-IS TE, MPLS TE, and
CSPF.
Figure 1-839 Outline of relationships between MPLS TE, CSPF, and IS-IS TE
To carry TE information in LSPs, IS-IS TE defines the following TLVs in standard protocols:
Extended IS reachability TLV
The Extended IS reachability TLV replaces the IS reachability TLV and extends the TLV
format using sub-TLVs. The implementation of sub-TLVs in TLVs is the same as that of
TLVs in LSPs. Sub-TLVs are used to carry TE information configured on physical
interfaces.
Usage Scenario
IS-IS TE helps MPLS TE set up TE tunnels. In Figure 1-840, a TE tunnel is set up between
Device A and Device C.
The metric style can be set to narrow, narrow-compatible, compatible, wide-compatible, or wide mode.
Table 1-245 shows which metric styles are carried in received and sent packets. A device can calculate
routes only when it can receive, send, and process corresponding TLVs. Therefore, to ensure correct data
forwarding on a network, the proper metric style must be configured for each device on the network.
Table 1-245 Metric style carried in received and sent under different metric style configurations
When the metric style is set to compatible, IS-IS sends the information both in narrow and
wide modes.
Process
If the metric style carried in sent packets is changed from narrow to wide:
The information previously carried by TLV type 128, TLV type 130, and TLV type 2 is
now carried by TLV type 135 and TLV type 22.
If the metric style carried in sent packets is changed from wide to narrow:
The information previously carried by TLV type 135 and TLV type 22 is now carried by
TLV type 128, TLV type 130, and TLV type 2.
If the metric style carried in sent packets is changed from narrow or wide to narrow and
wide:
The information previously carried in narrow or wide mode is now carried by TLV type
128, TLV type 130, TLV type 2, TLV type 135, and TLV type 22.
Usage Scenario
IS-IS wide metric is used to support IS-IS TE, and the metric style needs to be set to wide,
compatible or wide compatible.
Instead of replacing the Hello mechanism of IS-IS, BFD works with IS-IS to rapidly detect the faults
that occur on neighboring devices or links.
between the DIS and each device. No BFD sessions are established between
non-DISs.
On broadcast networks, devices (including non-DIS devices) of the same level on a
network segment can establish adjacencies. In BFD for IS-IS, however, BFD sessions are
established only between the DIS and non-DISs. On P2P networks, BFD sessions are
directly established between neighbors.
If a Level-1-2 neighbor relationship is set up between the devices on both ends of a link,
the following situations occur:
− On a broadcast network, IS-IS sets up a Level-1 BFD session and a Level-2 BFD
session.
− On a P2P network, IS-IS sets up only one BFD session.
Process of tearing down a BFD session
− P2P network
If the neighbor relationship established between P2P IS-IS interfaces is not Up,
IS-IS tears down the BFD session.
− Broadcast network
If the neighbor relationship established between broadcast IS-IS interfaces is not Up
or the DIS is reelected on the broadcast network, IS-IS tears down the BFD session.
If the configurations of dynamic BFD sessions are deleted or BFD for IS-IS is disabled
from an interface, all Up BFD sessions established between the interface and its
neighbors are deleted. If the interface is a DIS and the DIS is Up, all BFD sessions
established between the interface and its neighbors are deleted.
If BFD is disabled from an IS-IS process, BFD sessions are deleted from the process.
BFD detects only the one-hop link between IS-IS neighbors because IS-IS establishes only one-hop
neighbor relationships.
Usage Scenario
Dynamic BFD needs to be configured based on the actual network. If the time parameters are
not configured correctly, network flapping may occur.
BFD for IS-IS speeds up route convergence through rapid link failure detection. The
following is a networking example for BFD for IS-IS.
Background
IS-IS Auto fast re-route (FRR) is a dynamic IP FRR technology that minimizes traffic loss by
immediately switching traffic to the alternate link pre-computed by an IGP based on the
LSDBs on the entire network and stored in the FIB if a link or adjacent node failure is
detected. As IP FRR implements route convergence, it is becoming increasingly popular with
carriers.
Major Auto FRR techniques include loop-free alternate (LFA), U-turn, Not-Via, Remote LFA,
and MRT, among which IS-IS supports only LFA and Remote LFA.
Related Concepts
LFA
LFA is an IP FRR technology that calculates the shortest path from the neighbor that can
provide an alternate link to the destination node based on the Shortest Path First (SPF)
algorithm. Then, LFA calculates a loop-free alternate link with the smallest cost based on the
inequality: Distance_opt (N, D) < Distance_opt (N, S) + Distance_opt (S, D).
In the preceding inequality, S, D, and N indicate the source node, destination node, and a
neighbor of S, respectively, and Distance_opt (X,Y) indicates the shortest distance from node
X to node Y.
Remote LFA
LFA Auto FRR cannot be used to calculate alternate links on large-scale networks, especially
on ring networks. Remote LFA Auto FRR addresses this problem by calculating a PQ node
and establishing a tunnel between the source node of a primary link and the PQ node. If the
primary link fails, traffic can be automatically switched to the tunnel, which improves
network reliability.
P space
P space consists of the nodes through which the shortest path trees (SPTs) with the source
node of a primary link as the root are reachable without passing through the primary link.
Extended P space
Extended P space consists of the nodes through which the SPTs with neighbors of a primary
link's source node as the root are reachable without passing through the primary link.
Q space
Q space consists of the nodes through which the SPTs with the destination node of a primary
link as the root are reachable without passing through the primary link.
PQ node
A PQ node exists both in the extended P space and Q space and is used by Remote LFA as the
destination of a protection tunnel.
Figure 1-842 Networking for IS-IS LFA Auto FRR link protection
b. The interface cost of the device satisfies the inequality: Distance_opt (N, D) <
Distance_opt (N, E) + Distance_opt (E, D).
Figure 1-843 Networking for IS-IS LFA Auto FRR node-and-link protection
On the network shown in Figure 1-844, Remote LFA calculates the PQ node as follows:
1. Calculates the SPTs with all neighbors of P1 as roots. The nodes through which the SPTs
are reachable without passing through the primary link form an extended P space. The
extended P space in this example is {PE1, P1, P3, P4}.
2. Calculates the SPTs with P2 as the root and obtains the Q space {PE2, P4}.
3. Selects the PQ node (P4) that exists both in the extended P space and Q space.
Background
As the Internet develops, more data, voice, and video information are exchanged over the
Internet. New services, such as e-commerce, online conferencing and auctions, video on
demand, and distance learning, emerge gradually. The new services have high requirements
for network security. Carriers need to prevent data packets from being intercepted or modified
by attackers or unauthorized users. IS-IS authentication applies to the area or interface where
packets need to be protected. Using IS-IS authentication enhances system security and helps
carriers provide safe network services.
Related Concepts
Authentication Classification
Based on packet types, the authentication is classified as follows:
Interface authentication: is configured in the interface view to authenticate Level-1 and
Level-2 IS-to-IS Hello PDUs (IIHs).
Area authentication: is configured in the IS-IS process view to authenticate Level-1
CSNPs, PSNPs, and LSPs.
Routing domain authentication: is configured in the IS-IS process view to authenticate
Level-2 CSNPS, PSNPs, and LSPs.
Based on the authentication modes of packets, the authentication is classified into the
following types:
Simple authentication: The authenticated party directly adds the configured password to
packets for authentication. This authentication mode provides the lowest password
security.
MD5 authentication: uses the MD5 algorithm to encrypt a password before adding the
password to the packet, which improves password security.
Keychain authentication: further improves network security with a configurable key
chain that changes with time.
HMAC-SHA256 authentication: uses the HMAC-SHA256 algorithm to encrypt a
password before adding the password to the packet, which improves password security.
Implementation
IS-IS authentication encrypts IS-IS packets by adding the authentication field to packets to
ensure network security. After receiving IS-IS packets from a remote router, a local router
discards the packets if the authentication passwords in the packets are different from the
locally configured one. This mechanism protects the local router.
IS-IS provides a type-length-value (TLV) to carry authentication information. The TLV
components are as follows:
Type: indicates the type of a packet, which is 1 byte. The value defined by ISO is 10,
while the value defined by IP is 133.
Length: indicates the length of the authentication TLV, which is 1 byte.
Background
If network-wide IS-IS LSPs are deleted, purge LSPs are flooded, which adversely affects
network stability. In this case, source tracing must be implemented to locate the root cause of
the fault immediately to minimize the impact. However, IS-IS itself does not support source
tracing. A conventional solution is isolation node by node until the faulty node is located. The
solution is complex and time-consuming. Therefore, a fast source tracing method is required.
Related Concepts
PS-PDU: packets that carry information about the node that floods IS-IS purge LSPs.
CAP-PDU: packets used to negotiate the IS-IS purge LSP source tracing capability
between IS-IS neighbors.
IS-IS purge LSP source tracing port: ID of the UDP port that receives and sends IS-IS
purge LSP source tracing packets. The default port ID is 50121, which is configurable.
Implementation
IS-IS purge LSPs do not carry source information. If a device fails on the network, a large
number of purge LSPs are flooded. Without a source tracing mechanism, nodes are isolated
one by one until the faulty node is located, which is labor-intensive and time-consuming.
IS-IS purge LSPs will trigger route flapping on the network, or even routes become
unavailable. In this case, the device that floods the purge LSPs must be located and isolated
immediately.
A solution that can address the following issues is required:
Information about the source that floods IS-IS purge LSPs can be obtained when
network routes are unreachable.
The method used to obtain source information must apply to all devices on the network
and support incremental deployment, without compromising routing capabilities.
IS-IS purge LSP source tracing helps locate the device that floods purge LSPs. IS-IS purge
LSP source tracing uses a new UDP port. Source tracing packets are carried by UDP packets,
and the UDP packets also carry the IS-IS purge LSPs sent by the current device and are
flooded hop by hop based on the IS-IS topology.
IS-IS purge LSP source tracing forwards packets along UDP channels which are independent
of the channels used to transmit IS-IS packets. Therefore, IS-IS purge LSP source tracing
supports incremental deployment. In addition, source tracing does not affect the devices with
the related UDP port disabled.
After IS-IS purge LSP source tracing packets are flooded to devices on the network,
information about the node that floods purge LSPs can be queried on any of the devices,
which speeds up fault locating and faulty node isolation.
Capability Negotiation
IS-IS purge LSP source tracing is Huawei proprietary. It uses UDP to carry packets and listens
to the UDP port which is used to receive and send source tracing packets. If a source
tracing-capable Huawei device sends source tracing packets to a source tracing-incapable
Huawei device or non-Huawei device, the source tracing-capable Huawei device may be
incorrectly identified as an attacker. Therefore, the source tracing capability need to be
negotiated between the devices. In addition, the source tracing-capable device needs to help
the source tracing-incapable device to send source tracing information, which also requires
negotiation.
Source tracing capability negotiation depends on IS-IS neighbor relationships. Specifically,
after an IS-IS neighbor relationship is established, the local device initiates source tracing
capability negotiation based on the IP address of the neighbor.
PS-PDU Generation
If a device needs to purge an LSP, it generates and floods a PS-PDU to all its source tracing
neighbors.
If a device receives a purge LSP from a source tracing-incapable neighbor, the device
generates and floods a PS-PDU to all its neighbors. If a device receives the same purge LSP
(with the same LSP ID and sequence number) from more than one source tracing-incapable
neighbor, the device generates only one PS-PDU.
PS-PDU flooding is similar to IS-IS LSP flooding.
Security Concern
IS-IS purge LSP source tracing uses a UDP port to receive and send source tracing packets.
Therefore, the security of the port must be taken into consideration.
IS-IS purge LSP source tracing inevitably increases packet receiving and sending workload
and intensifies bandwidth pressure. To minimize the impact on IS-IS, the number of source
tracing packets must be controlled.
Authentication
Source tracing is embedded in IS-IS and uses IS-IS authentication parameters to
authenticate packets.
Generalized TTL security mechanism (GTSM)
GTSM is a security mechanism that checks whether the time to live (TTL) value in each
received IP packet header is within a pre-defined range.
IS-IS purge LSP source tracing packets can be flooded as far as one hop. If the TTL of a
packet is 255 when it is sent and not 254 when it is received, the packet will be
discarded.
CPU-CAR
The NP chip on interface boards can check the packets to be sent to the CPU for
processing and prevent the main control board from being overloaded by a large number
of packets that are sent to the CPU.
IS-IS purge LSP source tracing needs to apply for an independent CAR channel and has
small committed information rate (CIR) and committed burst size (CBS) values
configured.
Typical Scenarios
Scenario where all nodes support source tracing
All nodes on the network support source tracing, and Device A is the faulty source. Figure
1-845 shows the networking.
All nodes in the networking support IS-IS purge LSP source tracing, and Device A is the
faulty source.
When Device A purges an IS-IS LSP, it floods a source tracing packet that carries Device A
information and brief information about the LSP. Then the packet is flooded on the network
hop by hop. After the fault occurs, maintenance personnel can log in to any node on the
network to locate Device A that keeps sending purge LSPs and isolate Device A from the
network.
Scenario where source tracing-incapable nodes are not isolated from source
tracing-capable nodes
All nodes on the network except device C support source tracing, and device A is the faulty
source. In this case, the PS-LSA can be flooded on the entire network. Figure 1-846 shows the
networking.
Figure 1-846 Scenario where source tracing-incapable nodes are not isolated from source
tracing-capable nodes
When Device A purges an IS-IS LSP, it floods a source tracing packet that carries Device A
information and brief information about the LSP. Then the packet is flooded on the network
hop by hop. When Device B and Device E negotiate the source tracing capability with Device
C, they find that Device C does not support source tracing. Therefore, after Device B receives
the source tracing packet from Device A, Device B sends the packet to Device D, but not to
Device C. After receiving the purge LSP from Device C, Device E generates a source tracing
packet which carries information about the advertisement source (Device E), purge source
(Device C), and the purge LSP, and floods the packet on the network.
After the fault occurs, maintenance personnel can log in to any node on the network except
Device C to locate the faulty node. Two possible faulty nodes can be located in this case:
Device A and Device C, and they both sends the same purge LSP. In this case, Device A takes
precedence over Device C when the maintenance personnel determine the most possible
faulty source. After Device A is isolated, the network recovers. Then the possibility that
Device C is the faulty node is ruled out.
Scenario where source tracing-incapable nodes are isolated from source tracing-capable
nodes
All nodes on the network except devices C and D support source tracing, and Device A is the
faulty source. In this case, the PS-LSA cannot be flooded on the entire network. Figure 1-847
shows the networking.
Figure 1-847 Scenario where source tracing-incapable nodes are isolated from source
tracing-capable nodes
When Device A purges an IS-IS LSP, it floods a source tracing packet that carries Device A
information and brief information about the LSP. However, the source tracing packet can
reach only Device B because nodes C and Device D do not support IS-IS purge LSP source
tracing.
During source tracing capability negotiation, Device E finds that Device C does not support
source tracing, and Device F finds that Device D does not support source tracing. After
Device E receives the purge LSP from Device C, Device E helps Device E generate and flood
a source tracing packet. Similarly, after Device F receives the purge LSP from Device D,
Device F helps Device D generate and flood a source tracing packet.
After the fault occurs, if maintenance personnel log in to Device A or Device B, the personnel
can locate the faulty source (Device A) directly. After Device A is isolated, the network
recovers. If the maintenance personnel log in to Device E, Device F, Device G, or Device H,
the personnel will find that Device E claims Device C to be the faulty source and Device F
claims Device D to be the faulty source. Then the personnel log in to nodes C and Device D
and find that the purge LSP was sent by Device B, not generated by Device C or Device D.
Then the personnel log in to Device B, determine that Device A is the faulty node, and isolate
Device A. After Device A is isolated, the network recovers.
1.10.8.2.16 IS-IS MT
With IS-IS multi-topology (MT), IPv6, multicast, and advanced topologies can have their own
routing tables. This feature prevents packet loss if an integrated topology and the IPv4/IPv6
dual stack are deployed, isolates multicast services from unicast routes, improves network
resource usage, and reduces network construction cost.
Introduction
On a traditional IP network, IPv4 and IPv6 share the same integrated topology, and only one
unicast topology exists, which causes the following problems:
Packet loss if the IPv4/IPv6 dual stack is deployed: If some routers and links in an
IPv4/IPv6 topology do not support IPv4 or IPv6, they cannot receive IPv4 or IPv6
packets sent from the router that supports the IPv4/IPv6 dual stack. As a result, these
packets are discarded.
Multicast services highly depending on unicast routes: Only one unicast forwarding table
is available on the forwarding plane because only one unicast topology exists, which
forces services transmitted from one router to the same destination address to share the
same next hop, and various end-to-end services, such as voice and data services, to share
the same physical links. As a result, some links may be heavily congested while others
remain relatively idle. In addition, the multicast reverse path forwarding (RPF) check
depends on the unicast routing table. If the default unicast routing table is used when
transmitting multicast services, multicast services depend heavily on unicast routes, a
multicast distribution tree cannot be planned independently of unicast routes, and unicast
route changes affect multicast distribution tree establishment.
Deploying multiple topologies for different services on a physical network can address these
problems. IS-IS MT transmits multicast information by defining new TLVs in IS-IS packets.
Users can deploy multiple logical topologies on a physical network based on IP protocols or
service types supported by links so that separate SPF calculation operations are performed in
different topologies, which improves network usage.
If an IPv4 or IPv6 BFD session is Down in a topology on a network enabled with MT, neighbors of the
IPv4 or IPv6 address family will be affected.
Related Concepts
IS-IS MT allows multiple route selection subsets to be deployed on a versatile network
infrastructure and divides a physical network into multiple logical topologies, where each
topology performs its own SPF calculations.
IS-IS MT, an extension of IS-IS, allows multiple topologies to be applied to IS-IS. IS-IS MT
complies with standard protocols and transmits multicast information using new TLVs in
IS-IS packets. Users can deploy multiple logical topologies on a physical network. Each
topology performs its own SPF calculations and maintains its own routing table. Traffic of
different services, including the traffic transmitted in different IP topologies, has its own
optimal forwarding path.
The MT ID configured on an interface identifies the topology bound to the interface. One or
more MT IDs can be configured on a single interface.
With RPF check, upon receiving a packet, a router searches multicast static, unicast,
Multiprotocol Border Gateway Protocol (MBGP), and Multicast Interior Gateway Protocol
(MIGP) routing tables for an optimal route and sets it as the RPF route to the source IP
address of the packet. The packet can be transmitted only when it is destined for the RPF
interface.
Implementation
In IS-IS MT, the MT ID varies with the topology. Each Hello packet or LSP sent by a router
carries one or more MT TLVs of the topologies to which the source interface belongs. If the
router receives from a neighbor a Hello packet or LSP that carries only some of the local MT
TLVs, the router assumes that the neighbor belongs to only the default IPv4 topology. On a
point-to-point (P2P) link, an adjacency cannot be established between two neighbors that
share no common MT ID, while on a broadcast link, the relationship can be established in this
case.
Figure 1-848 shows the MT TLV format.
The following section describes separation of the IPv4 topology from the IPv6 topology, and
the multicast topology from the unicast topology.
Figure 1-849 shows the networking for separation of the IPv4 topology from the IPv6
topology. The values in the networking diagram are link costs. Device A, Device C, and
Device D support the IPv4/IPv6 dual stack; Device B supports IPv4 only and cannot
forward IPv6 packets.
Figure 1-849 Separation of the IPv4 topology from the IPv6 topology
Without IS-IS MT, Device A, Device B, Device C, and Device D use the IPv4/IPv6
topology for the SPF calculation. In this case, the shortest path from Device A to Device
D is Device A -> Device B- > Device D. IPv6 packets cannot reach Device D through
Device B because Device B does not support IPv6.
If a separate IPv6 topology is set up using IS-IS MT, Device A chooses only IPv6 links
to forward IPv6 packets. In this case, the shortest path from Device A to Device D is
Device A -> Device C -> Device D.
Figure 1-850 shows the networking for separation between unicast and a multicast
topologies using IS-IS MT.
Figure 1-850 Separation of the multicast topology from the unicast topology
On the network shown in Figure 1-850, all routers are interconnected using IS-IS. A TE
tunnel is set up between Device A (ingress) and Device E (egress). The outbound
interface of the route calculated by IS-IS may not be a physical interface but a TE tunnel
interface. In this case, router C through which the TE tunnel passes cannot set up
multicast forwarding entries. As a result, multicast services cannot be transmitted.
IS-IS MT addresses this problem by establishing separate unicast and multicast
topologies. TE tunnels are excluded from a multicast topology. Therefore, multicast
services are unaffected by TE tunnels.
Background
When multicast and an IGP Shortcut-enabled MPLS TE tunnel are configured on a network,
the outbound interface of the route calculated by IS-IS may not be a physical interface but a
TE tunnel interface. Multicast Join packets are transparent to routers through which the TE
tunnel passes. Therefore, these routers cannot generate multicast forwarding entries and
discard received multicast packets from the multicast source. Figure 1-851 shows the conflict
between multicast and a TE tunnel.
2. After Device B receives the Join packet, it selects TE-Tunnel 1/0/0 as the reverse path
forwarding (RPF) interface and adds an MPLS label to the packet before forwarding it
from Interface2 to Device C.
3. As the penultimate hop of the TE tunnel, Device C removes the MPLS label from the
Join packet before forwarding it from Interface2 to Device D. Because the forwarding is
based on MPLS, Device C does not generate any multicast forwarding entries.
4. After Device D receives the Join packet, it generates a multicast forwarding entry in
which the upstream and downstream interfaces are Interface1 and Interface2,
respectively. Device D then sends the Join packet to Device E, which has already
established the shortest path tree.
5. Multicast packets flow from Server to Device C through Device E and Device D. Device
C discards these packets because no multicast forwarding entry is available. As a result,
multicast services are interrupted.
IS-IS local multicast topology (MT) can address this problem.
Related Concepts
IS-IS local MT is a mechanism that enables the routing management (RM) module to create a
separate multicast topology on the local device so that protocol packets exchanged between
devices are not erroneously discarded. When the outbound interface of the route calculated by
IS-IS is an IGP Shortcut-enabled TE tunnel interface, IS-IS local MT calculates a physical
outbound interface for the route. This mechanism resolves the conflict between multicast and
a TE tunnel.
Implementation
Figure 1-852 shows how multicast packets are processed after IS-IS local MT is enabled.
1. Establishment of a multicast IGP (MIGP) routing table
Device B creates an independent MIGP routing table, records the TE tunnel interface,
and generates multicast routing entries for multicast packet forwarding. If the outbound
interface of a calculated route is a TE tunnel interface, IS-IS calculates a physical
outbound interface for the route and adds the route to the MIGP routing table.
2. Multicast packet forwarding
When forwarding multicast packets, a router searches the unicast routing table for a route.
If the next hop of the route is a tunnel interface, the router searches the MIGP routing
table for the physical outbound interface to forward multicast packets. In this example,
the original outbound interface of the route is TE tunnel 1/0/0. IS-IS re-calculates a
physical outbound interface (Interface2) for the route and adds the route to the MIGP
routing table. Device B then forwards multicast packets through GE 2/0/0 based on the
MIGP routing table and generates a multicast routing entry in the multicast routing table.
Therefore, multicast services are properly forwarded.
Usage Scenario
IS-IS local MT prevents multicast services from being interrupted on networks which allows
multicasting and has an IGP Shortcut-enabled TE tunnel.
Benefits
Local MT resolves the conflict between multicast and a TE tunnel and improves multicast
service reliability.
The first eight bytes in all IS-IS PDUs are public. Figure 1-853 shows the IS-IS PDU format.
Maximum Area Address: maximum number of area addresses supported by an IS-IS area.
The value 0 indicates that a maximum of three area addresses are supported by this IS-IS
area.
Type/Length/Value (TLV): encoding type that features high efficiency and expansibility.
Each type of PDU contains a different TLV. Table 1-247 shows the mapping between
TLV codes and PDU types.
P2P IIHs: Figure 1-855 shows the format of IIHs on a P2P network.
As shown in Figure 1-855, most fields in a P2P IIH are the same as those in a LAN IIH.
The P2P IIH does not have the priority and LAN ID fields but has a local circuit ID field.
The local circuit ID indicates the local link ID.
LSP Format
LSPs are used to exchange link-state information. There are two types of LSPs: Level-1 and
Level-2. Level-1 IS-IS transmits Level-1 LSPs. Level-2 IS-IS transmits Level-2 LSPs.
Level-1-2 IS-IS can transmit both Level-1 and Level-2 LSPs.
Level-1 and Level-2 LSPs have the same format, as shown in Figure 1-856.
SNP Format
SNPs describe the LSPs in all or some of the databases and are used to synchronize and
maintain all LSDBs. SNPs consist of complete SNPs (CSNPs) and partial SNPs (PSNPs).
CSNPs carry summaries of all LSPs in LSDBs, which ensures LSDB synchronization
between neighboring routers. On a broadcast network, the designated intermediate
system (DIS) sends CSNPs at an interval. The default interval is 10 seconds. On a P2P
link, neighboring devices send CSNPs only when a neighbor relationship is established
for the first time.
Figure 1-857 shows the CSNP format.
1.10.8.3 Applications
1.10.8.3.1 IS-IS MT
Figure 1-859 shows the use of IS-IS MT to separate an IPv4 topology from an IPv6 topology.
Device A, Device C, and Device D support IPv4/IPv6 dual-stack; Device B supports IPv4
only and cannot forward IPv6 packets.
Figure 1-859 Separation of the IPv4 topology from the IPv6 topology
If IS-IS MT is not used, Device A, Device B, Device C, and Device D consider the IPv4 and
IPv6 topologies the same when using the SPF algorithm for route calculation. The shortest
path from Device A to Device D is Device A -> Device B- > Device D. Device B does not
support IPv6 and cannot forward IPv6 packets to Device D.
If IS-IS MT is used to establish a separate IPv6 topology, Device A chooses only IPv6 links to
forward IPv6 packets. The shortest path from Device A to Device D changes to Device A ->
Device C -> Device D. IPv6 packets are then forwarded.
Figure 1-860 shows the use of IS-IS MT to separate unicast and multicast topologies.
Figure 1-860 Separation of the multicast topology from the unicast topology
All routers in Figure 1-860 are interconnected using IS-IS. A TE tunnel is set up between
Device A (ingress) and Device E (egress). The outbound interface of the route calculated by
IS-IS may not be a physical interface but a TE tunnel interface. The routers between which
the TE tunnel is established cannot set up multicast forwarding entries. As a result, multicast
services cannot run properly.
IS-IS MT is configured to solve this problem by establishing separate unicast and multicast
topologies. TE tunnels are excluded from a multicast topology; therefore, multicast services
can run properly, without being affected by TE tunnels.
1.10.8.4 Appendixes
Feature Supported Supported Differences
by IPv4 by IPv6
IS-IS TE Yes No -
1.10.9 BGP
1.10.9.1 Introduction
BGP Definition
Border Gateway Protocol (BGP) is a dynamic routing protocol used between Autonomous
Systems (ASs). BGP is widely used by Internet Service Providers (ISPs).
As three earlier-released versions of BGP, BGP-1, BGP-2, and BGP-3 are used to exchange
reachable inter-AS routes, establish inter-AS paths, avoid routing loops, and apply routing
policies between ASs.
Currently, BGP-4 is used.
BGP has the following characteristics:
Unlike an Interior Gateway Protocol (IGP), such as Open Shortest Path First (OSPF) and
Routing Information Protocol (RIP), BGP is an Exterior Gateway Protocol (EGP) which
controls route advertisement and selects optimal routes between ASs rather than
discovering or calculating routes.
BGP uses Transport Control Protocol (TCP) as the transport layer protocol, which
enhances BGP reliability.
− BGP selects inter-AS routes, which poses high requirements on stability. Therefore,
using TCP enhances BGP's stability.
− BGP peers must be logically connected through TCP. The destination port number
is 179 and the local port number is a random value.
BGP supports Classless Inter-Domain Routing (CIDR).
When routes are updated, BGP transmits only the updated routes, which reduces
bandwidth consumption during BGP route distribution. Therefore, BGP is applicable to
the Internet where a large number of routes are transmitted.
BGP is a distance-vector routing protocol.
BGP is designed to prevent loops.
− Between ASs: BGP routes carry information about the ASs along the path. The
routes that carry the local AS number are discarded to prevent inter-AS loops.
− Within an AS: BGP does not advertise routes learned in an AS to BGP peers in the
AS to prevent intra-AS loops.
BGP provides many routing policies to flexibly select and filter routes.
BGP provides a mechanism that prevents route flapping, which effectively enhances
Internet stability.
BGP can be easily extended.
BGP4+ Definition
As a dynamic routing protocol used between ASs, BGP4+ is an extension of BGP.
Traditional BGP4 manages IPv4 routing information but does not support the inter-AS
transmission of packets encapsulated by other network layer protocols (such as IPv6).
To support IPv6, BGP4 must have the additional ability to associate an IPv6 protocol with the
next hop information and network layer reachable information (NLRI).
Purpose
BGP transmits route information between ASs. It, however, is not required in all scenarios.
1.10.9.2 Principles
1.10.9.2.1 Basic Principle
BGP Messages
BGP runs by sending five types of messages: Open, Update, Notification, Keepalive, and
Route-refresh.
Open: The first message sent after a TCP connection is set up is an Open message, which
is used to set up BGP peer relationships. After a peer receives an Open message and the
peer negotiation is successful, the peer sends a Keepalive message to confirm and
maintain the peer relationship. Then, peers can exchange Update, Notification, Keepalive,
and Route-refresh messages.
Update: This type of message is used to exchange routes between BGP peers.
− An Update message can advertise multiple reachable routes with the same attributes.
These route attributes are applicable to all destination addresses (expressed by IP
prefixes) in the Network Layer Reachability Information (NLRI) field of the Update
message.
− An Update message can be used to delete multiple unreachable routes. Each route is
identified by its destination address (using the IP prefix), which identifies the routes
previously advertised between BGP speakers.
− An Update message can be used only to delete routes. In this case, it does not need
to carry the route attributes or NLRI. In addition, an Update message can be used
only to advertise reachable routes. In this case, it does not need to carry information
about the deleted routes.
Notification: When BGP detects an error, it sends a Notification message to its peer. The
BGP connection is then torn down immediately.
Keepalive: BGP periodically sends Keepalive messages to peers to maintain peer
relationships.
Route-refresh: This type of message is used to request that the peer resend all reachable
routes.
If all BGP routers are enabled with the Route-refresh capability and the import policy of
BGP changes, the local BGP router sends a Route-refresh message to its peers. After
receiving the Route-refresh message, the peers resend their routing information to the
local BGP router. In this manner, BGP routing tables are dynamically refreshed and new
routing policies are used without tearing down BGP connections.
BGP Processing
BGP adopts TCP as its transport layer protocol. Therefore, a TCP connection must be
available between the peers. BGP peers negotiate parameters by exchanging Open
messages to establish a BGP peer relationship.
After the peer relationship is established, BGP peers exchange BGP routing tables. BGP
does not periodically update a routing table. When BGP routes change, BGP updates the
changed BGP routes in the BGP routing table by sending Update messages.
BGP sends Keepalive messages to maintain the BGP connection between peers.
After detecting an error on a network, BGP sends a Notification message to report the
error and the BGP connection is torn down.
BGP Attributes
BGP route attributes are a set of parameters that describe specific BGP routes. With BGP
route attributes, BGP can filter and select routes. BGP route attributes are classified into the
following types:
Well-known mandatory: This type of attribute can be identified by all BGP routers and
must be carried in Update messages. Without this attribute, errors occur in the routing
information.
Well-known discretionary: This type of attribute can be identified by all BGP routers.
This type of attribute is optional and, therefore, is not necessarily carried in Update
messages.
Optional transitive: This indicates the transitive attribute between ASs. A BGP router
may not recognize this attribute, but the router still receives it and advertises it to other
peers.
Optional non-transitive: If a BGP router does not recognize this type of attribute, the
router does not advertise it to other peers.
The most common BGP route attributes are as follows:
Origin
The Origin attribute defines the origin of a route. The Origin attribute is classified into
the following types:
− Interior Gateway Protocol (IGP): This attribute type has the highest priority. IGP is
the Origin attribute for routes obtained through an IGP in the AS from which the
routes originate. For example, the Origin attribute of the routes imported to the BGP
routing table using the network command is IGP.
− Exterior Gateway Protocol (EGP): This attribute type has the second highest
priority. The Origin attribute of the routes obtained through EGP is EGP.
− Incomplete: This attribute type has the lowest priority. Incomplete is the Origin
attribute type of all routes that do not have the IGP or EGP Origin attribute. For
example, the Origin attribute of the routes imported using the import-route
command is Incomplete.
AS_Path
The AS-Path attribute records all ASs through which a route passes from the local end to
the destination in distance-vector (DV) order.
When a BGP speaker advertises a local route:
− When advertising the route beyond the local AS, the BGP speaker adds the local AS
number to the AS_Path list and then advertises it to the neighboring routers through
Update messages.
− When advertising the route within the local AS, the BGP speaker creates an empty
AS_Path list in an Update message.
When a BGP speaker advertises a route learned from the Update messages of another
BGP speaker:
− When advertising the route beyond the local AS, the BGP speaker adds the local AS
number to the left of the AS_Path list. From the AS_Path attribute, the BGP router
that receives the route learns the ASs through which the route passes to the
destination. The number of the AS that is nearest to the local AS is placed on the left
of the list, while other AS numbers are listed in sequence.
− When advertising the route within the local AS, the BGP speaker does not change
the AS_Path attribute.
The AS_Path attribute has four types:
− AS_Sequence: records in reverse order all the ASs through which a route passes
from the local device to the destination.
− AS_Set: records without an order all the ASs through which a route passes through
from the local device to the destination. The AS_Set attribute is used in route
summarization scenarios. After route summarization, the device records the
unsequenced AS numbers because it cannot sequence the numbers of ASs through
which specific routes pass. No matter how many AS numbers an AS_Set contains,
BGP regards the AS_Set as one AS number when calculating routes.
− AS_Confed_Sequence: records in reverse order all the sub-ASs within a BGP
confederation through which a route passes from the local device to the destination.
− AS_Confed_Set: records without an order all the sub-ASs within a BGP
confederation through which a route passes from the local device to the destination.
The AS_Confed_Set attribute is used in route summarization scenarios in a
confederation.
The AS_Confed_Sequence and AS_Confed_Set attributes are used to prevent routing
loops and to select routes among the various sub-ASs in a confederation.
Next_Hop
Different from the Next_Hop attribute in an IGP, the Next_Hop attribute in BGP is not
necessarily the IP address of a neighboring router. In most cases, the Next_Hop attribute
in BGP complies with the following rules:
− When advertising a route to an EBGP peer, a BGP speaker sets the Next_Hop of the
route to the address of the local interface through which the BGP peer relationship
is established.
− When advertising a locally generated route to an IBGP peer, a BGP speaker sets the
Next_Hop of the route to the address of the local interface through which the BGP
peer relationship is established.
− When advertising a route learned from an EBGP peer to an IBGP peer, the BGP
speaker does not change the Next_Hop of the route.
MED
The Multi-Exit-Discriminator (MED) is transmitted only between two neighboring ASs.
The AS that receives the MED does not advertise it to a third AS.
Similar to the cost used by an IGP, the MED is used to determine the optimal route when
traffic enters an AS. When a BGP peer learns multiple routes that have the same
destination address but different next hops from EBGP peers, the route with the smallest
MED value is selected as the optimal route if all other attributes are the same.
Local_Pref
The Local_Pref attribute indicates the BGP priority of a route. It is available only to
IBGP peers and is not advertised to other ASs.
The Local_Pref attribute is used to determine the optimal route when traffic leaves an AS.
When a BGP router obtains multiple routes to the same destination address but with
different next hops through IBGP peers, the route with the largest Local_Pref value is
selected.
For details about route import, see Route Import; for details about BGP route selection rules,
see BGP Route Selection; for details about route summarization, see Route Summarization;
for details about advertising routes to BGP peers, see BGP Route Advertisement.
For details about import or export policies, see "Routing Policies" in NE20E Feature
Description — IP Routing.
Route Import
BGP itself cannot discover routes. Therefore, it needs to import other protocol routes, such as
IGP routes or static routes, to the BGP routing table. Imported routes can be transmitted
within an AS or between ASs.
With load balancing, if the preceding conditions are equal and multiple external routes with the same
AS_Path are available, load balancing is performed among them. The number of routes load-balancing
traffic must be less than or equal to the configured number. After the load-balancing as-path-ignore
command is run, the routes with different As_Path values can load-balance traffic.
8. Prefers the route with the Origin type as IGP, EGP, and Incomplete in descending order.
9. Prefers the route with the smallest MED value.
If the bestroute med-plus-igp command is run, BGP preferentially selects the route with the smallest
sum of MED multiplied by a MED multiplier and IGP cost multiplied by an IGP cost multiplier.
− BGP compares the MEDs of only routes from the same AS (excluding
confederation sub-ASs). MEDs of two routes are compared only when the first AS
number in the AS_Sequence (excluding AS_Confed_Sequence) of one route is the
same as its counterpart in the other route.
− If a route does not carry MED, BGP considers its MED as the default value (0)
during route selection. If the bestroute med-none-as-maximum command is run,
BGP considers its MED as the largest MED value (4294967295).
− If the compare-different-as-med command is run, BGP compares MEDs of routes
even when the routes are received from peers in different ASs. Do not run this
command unless the ASs use the same IGP and route selection mode. Otherwise, a
loop may occur.
− If the deterministic-med command is run, routes are no longer selected in the
sequence in which they are received.
10. Prefers local VPN routes, LocalCross routes, and RemoteCross routes in descending
order.
LocalCross routes indicate local VPN cross routes or routes imported between public network and VPN
instances.
If the ERT of a VPNv4 route in the routing table of a VPN instance on a PE matches the
IRT of another VPN instance on the PE, the VPNv4 route is added to the routing table of
the second VPN instance. This route is called a LocalCross route. If the ERT of a VPNv4
route learned from a remote PE matches the IRT of a VPN instance on the local PE, the
VPNv4 route is added to the routing table of that VPN instance. This route is called a
RemoteCross route.
11. Prefers EBGP routes to IBGP routes.
12. Prefers the route that is iterated to an IGP route with the smallest cost.
If the bestroute igp-metric-ignore command is run, BGP no longer compares the IGP
cost.
13. Prefers the route with the shortest Cluster_List length.
By default, Cluster_List takes precedence over Router ID during BGP route selection. To enable Router
ID to take precedence over Cluster_List during BGP route selection, run the bestroute
routerid-prior-clusterlist command.
14. Prefers the route advertised by the router with the smallest router ID.
If the bestroute router-id-ignore command is run, router IDs do not determine which
route is selected for BGP.
If each route carries an Originator_ID, the originator IDs rather than router IDs are compared during
route selection. The route with the smallest Originator_ID is preferred.
15. Prefers the route learned from the peer with the smallest IP address.
16. If BGP Flow Specification routes are configured locally, the first configured BGP Flow
Specification route is preferentially selected.
17. Prefers the locally imported route in the RM routing table.
If a direct route, static route, and IGP route are imported, BGP preferentially selects the
direct route, static route, and IGP route in descending order.
18. Prefers the Add-Path route with the smallest recv pathID.
19. Prefers the RemoteCross route with the largest RD.
20. Prefers locally received routes to the routes imported between VPN and public network
instances.
21. Prefers the route that was learned the earliest.
For details about BGP route attributes, see 1.10.9.2.1 Basic Principle.
For details about the BGP route selection process, see Figure 1-864.
Route Summarization
On a large-scale network, the BGP routing table can be very large. Route summarization can
reduce the size of the routing table.
Route summarization is the process of summarizing specific routes with the same IP prefix
into a summarized route. After route summarization, BGP advertises only the summarized
route rather than all specific routes to BGP peers.
BGP supports automatic and manual route summarization.
Automatic route summarization: takes effect on the routes imported by BGP. With
automatic route summarization, the specific routes for the summarization are suppressed,
and BGP summarizes routes based on the natural network segment and sends only the
summarized route to BGP peers. For example, 10.1.1.1/24 and 10.2.1.1/24 are
summarized into 10.0.0.0/8, which is a Class A address.
Manual route summarization: takes effect on routes in the local BGP routing table. With
manual route summarization, users can control the attributes of the summarized route
and determine whether to advertise the specific routes.
IPv4 supports both automatic and manual route summarization, while IPv6 supports only
manual route summarization.
1.10.9.2.3 AIGP
Background
The Accumulated Interior Gateway Protocol Metric (AIGP) attribute is an optional
non-transitive Border Gateway Protocol (BGP) path attribute. The attribute type code
assigned by the Internet Assigned Numbers Authority (IANA) for the AIGP attribute is 26.
Routing protocols, such as IGPs that have been designed to run within a single administrative
domain, generally assign a metric to each link, and then choose the path with the smallest
metric as the optimal path between two nodes. BGP, designed to provide routing over a large
number of independent administrative domains, does not select paths based on metrics. If a
single administrative domain runs several contiguous BGP networks, it is desirable for BGP
to select paths based on metrics, just as an IGP does. The AIGP attribute enables BGP to
select paths based on metrics.
Related Concepts
An AIGP administrative domain is a set of autonomous systems (ASs) in a common
administrative domain. The AIGP attribute takes effect only in an AIGP administrative
domain. Figure 1-865 shows the networking diagram of AIGP application.
Implementation
AIGP Attribute Origination
The AIGP attribute can be added to a route only through a route-policy. You can configure a
BGP route to add an AIGP value when routes are imported, received, or sent. If no AIGP
value is configured, BGP routes do not contain AIGP attributes.
AIGP Attribute Delivery
BGP does not allow the AIGP attribute to leak out of an AIGP administrative domain
boundary onto the Internet. If the AIGP attribute of a route changes, BGP sends Update
packets for BGP peers to update information about this route. In a scenario in which A, a BGP
speaker, sends a route that carries the AIGP attribute to B, its BGP peer:
If B does not support the AIGP attribute or does not have the AIGP capability enabled for
a peer, B ignores the AIGP attribute and does not transmit the AIGP attribute to other
BGP peers.
If B supports the AIGP attribute and has the AIGP capability enabled for a peer, B can
modify the AIGP attribute of the route only after B has set itself to be the next hop of the
route. To modify the AIGP attribute of the route, B complies with the following rules:
− If the BGP peer relationship between A and B is established over an IGP route, or a
static route that does not require recursive next hop resolution, B uses the IGP or
static route metric value plus the received AIGP attribute value as the new AIGP
attribute value of the received route and sends the new AIGP attribute along with
the route to other BGP peers.
− If the BGP peer relationship between A and B is established over a BGP route, or a
static route that requires recursive next hop resolution, route iteration occurs when
B sends data to A. Each route iteration requires a pre-existing route. B uses the sum
of metric values for iterated routes along the path from B to A plus the received
AIGP attribute value as the new AIGP attribute value of the received route and
sends the new AIGP attribute along with the route to other BGP peers.
Role of the AIGP Attribute in BGP Route Selection
If multiple active routes exist between two nodes, BGP will make a route selection decision.
If BGP cannot determine the optimal route based on PrefVal, Local_Pref, and Route-type,
BGP compares the AIGP attributes of these routes. BGP route selection rules are as follows:
If BGP cannot determine the optimal route based on Route-type, BGP compares the
AIGP attributes. If this method still cannot determine the optimal route, BGP proceeds to
compare the AS_Path attributes.
The priority of a route that carries the AIGP attribute is higher than the priority of a route
that does not carry the AIGP attribute.
If all routes carry the AIGP attribute, the route with the smallest AIGP attribute value
plus the IGP metric value of the iterated next hop is preferred over the other routes.
Usage Scenario
The AIGP attribute is used to select the optimal route in an AIGP administrative domain.
Benefits
After the AIGP attribute is configured in an AIGP administrative domain, BGP selects paths
based on metrics, just as an IGP. Consequently, all devices in the AIGP administrative domain
use the optimal routes to forward data.
If route flapping occurs, a router sends an Update packet to its peers. After the peers receive
the Update packet, they recalculate routes and update their routing tables. Frequent route
flapping consumes lots of bandwidth and CPU resources and can even affect network
operations.
Route dampening can address this problem. In most cases, BGP is deployed on complex
networks where routes change frequently. To reduce the impact of frequent route flapping,
BGP adopts route dampening to suppress unstable routes.
BGP dampening measures route stability using a penalty value. The greater the penalty value,
the less stable a route. Each time route flapping occurs (a device receives a Withdraw or an
Update packet), BGP adds a penalty value to the route carried in the packet. If a route changes
from active to inactive, the penalty value increases by 1000. If a route is updated when it is
active, the penalty value increases by 500. When the penalty value of a route exceeds the
Suppress value, the route is suppressed. As a result, BGP does not add the route to the routing
table or advertise any Update message to BGP peers.
The penalty value of a suppressed route reduces by half after a half-life period. When the
penalty value decreases to the Reuse value, the route becomes reusable, and BGP adds the
route to the IP routing table and advertises an Update packet carrying the route to BGP peers.
The penalty value, suppression threshold, and half-life are configurable. Figure 1-866 shows
the process of BGP route dampening.
Route dampening applies only to EBGP routes and VPNv4 IBGP routes. IBGP routes (except
VPNv4 IBGP routes) cannot be dampened because IBGP routing tables contain the routes
from the local AS, which require that the forwarding entries be the same on IBGP peers in the
AS. If IBGP routes are dampened, the forwarding entries may be inconsistent because
dampening parameters may vary among these IBGP peers.
Well-known Community
Table 1-248 lists well-known communities of BGP routes.
Usage Scenario
On the network shown in Figure 1-867, EBGP connections are established between Device A
and Device B, and between Device B and Device C. If the community attribute of No_Export
is configured on Device A and Device A sends a route with the community attribute to Device
B, Device B does not advertise the route to other ASs after receiving it.
In Figure 1-868, there are multiple BGP routers in AS 200. To reduce the number of IBGP
connections, AS 200 is divided into three sub-ASs: AS 65001, AS 65002, and AS 65003. In
AS 65001, fully meshed IBGP connections are established between the three routers.
BGP speakers outside a confederation such as Router F in AS 100, do not know the existence
of the sub-ASs (AS 65001, AS 65002, and AS 65003) in the confederation. The confederation
ID is the AS number that is used to identify the entire confederation. For example, AS 200 in
Figure 1-868 is the confederation ID.
Applications
After an RR receives routes from its peers, it selects the optimal route based on BGP route
selection policies and performs one of the following operations:
If the optimal route is from a non-client IBGP peer, the RR advertises the route to all
clients.
If the optimal route is from a client, the RR advertises the route to all non-clients and
clients.
If the optimal route is from an EBGP peer, the RR advertises the route to all clients and
non-clients.
An RR is easy to configure because it only needs to be configured on the RR itself, and clients
do not need to know whether they are clients.
On some networks, if fully meshed connections have already been established among clients
of an RR, they can exchange routing information directly. In this case, route reflection among
the clients through the RR is unnecessary and occupies bandwidth. For example, on the
NE20E, route reflection can be disabled, but the routes between clients and non-clients are
still exchanged. By default, route reflection between clients through the RR is enabled.
On the NE20E, an RR can change various attributes of BGP routes, such as the AS_Path,
MED, Local_Pref, and community attributes.
Originator_ID
Originator_ID and Cluster_List are used to detect and prevent routing loops.
The Originator_ID attribute is four bytes long and is generated by an RR. It carries the router
ID of the route originator in the local AS.
When a route is reflected by an RR for the first time, the RR adds the Originator_ID to
this route. If a route already carries the Originator_ID attribute, the RR does not create a
new one.
After receiving the route, a BGP speaker checks whether the Originator_ID is the same
as its router ID. If Originator_ID is the same as its router ID, the BGP speaker discards
this route.
Cluster_List
To prevent routing loops between ASs, a BGP router uses the AS_Path attribute to record the
ASs through which a route passes. Routes with the local AS number are discarded by the
router. To prevent routing loops within an AS, IBGP peers do not advertise routes learned
from the local AS.
With RR, IBGP peers can advertise routes learned from the local AS to each other. However,
the Cluster_List attribute must be deployed to prevent routing loops within the AS.
An RR and its clients form a cluster. In an AS, each RR is uniquely identified by a
Cluster_ID.
Similar to an AS_Path, a Cluster_List is composed of a series of Cluster_IDs and is generated
by an RR. The Cluster_List records all the RRs through which a route passes.
Before an RR reflects a route between its clients or between its clients and non-clients,
the RR adds the local Cluster_ID to the head of the Cluster_List. If a route does not carry
any Cluster_List, the RR creates one for the route.
After the RR receives an updated route, it checks the Cluster_List of the route. If the RR
finds that its cluster ID is included in the Cluster_List, the RR discards the route. If its
cluster ID is not included in the Cluster_List, the RR adds its cluster ID to the
Cluster_List and then reflects the route.
Backup RR
To enhance network reliability and prevent single points of failure, more than one route
reflector needs to be configured in a cluster. The route reflectors in the same cluster must
share the same Cluster_ID to prevent routing loops.
With backup RRs, clients can receive multiple routes to the same destination from different
RRs. The clients then apply route selection policies to choose the optimal route.
On the network shown in Figure 1-870, RR1 and RR2 are in the same cluster. RR1 and RR2
establish an IBGP connection so that each RR is a non-client of the other RR.
If Client 1 receives an updated route from an external peer, Client 1 advertises the route
to RR1 and RR2 through IBGP.
After receiving the updated route, RR1 reflects the route to other clients (Client 2 and
Client 3) and the non-client (RR2) and adds the local Cluster_ID to the head of the
Cluster_List.
After receiving the reflected route, RR2 checks the Cluster_List. RR2 finds that its
Cluster_ID is contained in the Cluster_List; therefore, it discards the updated route.
If RR1 and RR2 are configured with different Cluster_IDs, each RR receives both the route
from Client 1 and the updated route reflected from the other RR. Therefore, configuring the
same Cluster_ID for RR1 and RR2 reduces the number of routes that each RR receives and
memory consumption.
The application of Cluster_List prevents routing loops among RRs in the same AS.
Multiple Clusters in an AS
Multiple clusters may exist in an AS. RRs are IBGP peers of each other. An RR can be
configured as a client or non-client of another RR.
For example, the backbone network shown in Figure 1-871 is divided into multiple clusters.
Each RR is configured as a non-client of the other RRs, and these RRs are fully meshed. Each
client establishes IBGP connections with only the RR in the same cluster. In this manner, all
BGP peers in the AS can receive reflected routes.
Hierarchical Reflector
Hierarchical reflectors are deployed on live networks. On the network shown in Figure 1-872,
the ISP provides Internet routes for AS 100. Two EBGP connections are established between
the ISP and AS 100. AS 100 is divided into two clusters. The four routers in Cluster 1 are core
routers.
Two Level-1 RRs (RR-1s) are deployed in Cluster 1, which ensures the reliability of the
core layer of AS 100. The other two routers in the core layer are clients of RR-1s.
One Level-2 RR (RR-2) is deployed in Cluster 2. RR-2 is a client of RR-1.
In Figure 1-874, PEs have the same VPN instance (vpna) and RTs (including the ERT and
IRT). The RD configured for PE2 and PE3 is 2:2, and the RD configured for PE4 is 3:3. Site 2
has a route destined for 10.1.1.0/24. The route is sent to PE2, PE3, and PE4, who convert this
route to multiple BGP VPNv4 routes and send them to PE1. On receipt of the BGP VPNv4
routes, PE1 implements route crossing as shown in Figure 1-875. The detailed process is as
follows:
1. After receiving the BGP VPNv4 routes from PE2, PE3, and PE4, PE1 adds them to the
BGP VPNv4 routing table.
2. PE1 converts the BGP VPNv4 routes to BGP VPN routes by removing their RDs, adds
the BGP VPN routes to the routing table of the VPN instance, selects an optimal route
from the BGP VPN routes based on BGP route selection policies, and adds the optimal
BGP VPN route to the IP VPN instance routing table.
1.10.9.2.10 MP-BGP
Conventional BGP-4 manages only IPv4 unicast routing information, and inter-AS
transmission of packets of other network layer protocols, such as IPv6 and multicast, is
limited.
To support multiple network layer protocols, the Internet Engineering Task Force (IETF)
extends BGP-4 to Multiprotocol Extensions for BGP-4 (MP-BGP). MP-BGP is forward
compatible. Specifically, routers supporting MP-BGP can communicate with the routers that
do not support MP-BGP.
Extended Attributes
BGP-4 Update packets carry three IPv4-related attributes: NLRI (Network Layer Reachable
Information), Next_Hop, and Aggregator. Aggregator contains the IP address of the BGP
speaker that performs route aggregation.
To carry information about multiple network layer protocols in NLRI and Next_Hop,
MP-BGP introduces the following route attributes:
Address Family
The Address Family Information field consists of a 2-byte Address Family Identifier (AFI)
and a 1-byte Subsequent Address Family Identifier (SAFI).
BGP uses address families to distinguish different network layer protocols. For the values of
address families, see relevant standards. The NE20E supports multiple MP-BGP extension
applications, such as VPN extension and IPv6 extension, which are configured in their
respective address family views.
For details about the BGP VPNv4 address family and BGP VPN instance address family, see
the HUAWEI NE20E-S2 Universal Service Router Feature Description - VPN.
BGP Authentication
BGP can work properly only after BGP peer relationships are established. Authenticating BGP
peers can improve BGP security. BGP supports the following authentication modes:
MD5 authentication
BGP uses TCP as the transport layer protocol. Message Digest 5 (MD5) authentication
can be used when establishing TCP connections to improve BGP security. MD5
authentication sets the MD5 authentication password for the TCP connection, and TCP
performs the authentication. If the authentication fails, the TCP connection cannot be
established.
Keychain authentication
Keychain authentication is performed on the application layer. It ensures smooth service
transmission and improves security by periodically changing the password and
encryption algorithms. When keychain authentication is configured for BGP peer
relationships over TCP connections, BGP packets as well as the establishment process of
a TCP connection can be authenticated. For details about keychain, see "Keychain" in
HUAWEI NE20E-S2 Feature Description - Security.
GTSM
During network attacks, attackers may simulate BGP packets and continuously send them to
the router. If the packets are destined for the router, it directly forwards them to the control
plane for processing without validating them. As a result, the increased processing workload
on the control plane results in high CPU usage.
The Generalized TTL Security Mechanism (GTSM) defends against attacks by checking
whether the time to live (TTL) value in each IP packet header is within a pre-defined range.
TTL refers to the maximum number of routers through which a packet can pass.
In actual networking, packets whose TTL values are not within the specified range are either
allowed to pass or discarded by the GTSM. To configure the GTSM to discard packets, you
need to set an appropriate TTL value range according the network topology. Then, packets
whose TTL values are not within the specified range are discarded, which prevents the local
device from potential attacks.
You can also enable the log function to record discarded packets for further fault location.
RPKI
Resource Public Key Infrastructure (RPKI) improves BGP security by validating the origin
ASs of BGP routes.
Attackers can steal user data by advertising routes that are more specific than those advertised
by carriers. For example, if a carrier has advertised a route destined for 10.10.0.0/16, an
attacker can advertise a route destined for 10.10.153.0/16, which is more specific than
10.10.0.0/16. According to the longest match rule, 10.10.153.0/16 is preferentially selected for
traffic forwarding. As a result, the attacker succeeds in intercepting user data.
To address this issue, establish an RPKI session between a router and an RPKI server. The
router will then query Route Origin Authorizations (ROAs) from the RPKI server through the
RPKI session and match the origin AS of each received BGP route against the ROAs. This
mechanism ensures that only the routes that originate from the trusted ASs are accepted. The
validation result can also be applied to BGP route selection to ensure that hosts in the local AS
can communicate with hosts in other ASs.
1.10.9.2.12 BGP GR
Graceful restart (GR) is one of the high availability (HA) technologies, which comprise a
series of comprehensive technologies such as fault-tolerant redundancy, link protection, faulty
node recovery, and traffic engineering. As a fault-tolerant redundancy technology, GR ensures
normal forwarding of data during the restart of routing protocols to prevent interruption of
key services. Currently, GR has been widely applied to the master/slave switchover and
system upgrade.
GR is usually used when the active route processor (RP) fails because of a software or
hardware error, or used by an administrator to perform the master/slave switchover.
Related Concepts
The concepts related to GR are as follows:
GR Restarter: indicates a device that performs master/slave switchover triggered by the
administrator or a failure. A GR Restarter must support GR.
GR Helper: indicates the neighbor of a GR Restarter. A GR Helper must support GR.
GR session: indicates a session, through which a GR Restarter and a GR Helper can
negotiate GR capabilities.
GR time: indicates the time when the GR Helper finds that the GR Restarter is Down but
keeps the topology information or routes obtained from the GR Restarter.
End-of-RIB (EOR): indicates BGP information, notifying a peer BGP that the first route
upgrade is finished after the negotiation.
EOR timer: indicates a maximum time of a local device waiting for the EOR information
sent from the peer. If the local device does not receive the EOR information from the
peer within the EOR timer, the local device will select an optimal route from the current
routes.
Principles
Principles of BGP GR are as follows:
1. During BGP peer relationship establishment, devices negotiate GR capabilities by
sending supported GR capabilities to each other.
2. When detecting the master/slave switchover of the GR Restarter, a GR Helper does not
delete the routing information and forwarding entries related to the GR Restarter within
the GR time, but waits to re-establish a BGP connection with the GR Restarter.
3. After the master/slave switchover, the GR Restarter receives routes from all the
negotiated peers with GR capabilities before the switchover, and starts the EOR timer.
The GR Restarter selects a route when either of the following conditions is met:
− The GR Restarter receives the EOR information of all peers and the EOR timer is
deleted.
− The EOR timer times out but the GR Restarter receives no EOR information from
all peers.
4. The GR Restarter sends the optimal route to the GR Helper and the GR Helper starts the
EOR timer. The GR Helper quits GR when either of the following conditions is met:
− The GR Helper receives the EOR information from the GR Restarter and the EOR
timer is deleted.
− The EOR timer times out and the GR Helper receives no EOR information from the
GR Restarter.
GR Reset
Currently, BGP does not support dynamic capability negotiation. Therefore, each time a new
BGP capability (such as the IPv4, IPv6, VPNv4, and VPNv6 capabilities) is enabled on a BGP
speaker, the BGP speaker tears down existing sessions with its peer and renegotiates BGP
capabilities. This process will interrupt ongoing services.
To prevent the service interruptions, the NE20E provides the GR reset function that enables
the NE20E to reset a BGP session in GR mode. With the GR reset function configured, when
you enable a new BGP capability on the BGP speaker, the BGP speaker enters the GR state,
resets the BGP session, and renegotiates BGP capabilities with the peer. In the whole process,
the BGP speaker re-establishes the existing sessions but does not delete the routing entries for
the existing sessions, so that the existing services are not interrupted.
Benefits
Through BGP GR, the forwarding is not interrupted. In addition, the flapping of BGP occurs
only on the neighbors of the GR Restarter, and does not occur in the entire routing domain.
This is important for BGP that needs to process a large number of routes.
Networking
As shown in Figure 1-876, Device A and Device B belong to ASs 100 and 200, respectively.
The two routers are directly connected and establish an External Border Gateway Protocol
(EBGP) peer relationship.
BFD is enabled to detect the EBGP peer relationship between Device A and Device B. If the
link between Device A and Device B fails, BFD can quickly detect the fault and notify BGP.
Background
As IPv6 technology becomes more popular, an increasing number of separate IPv6 networks
take shape. IPv6 provider edge (6PE), a technology designed to provide IPv6 services over
IPv4 networks, allows service providers to provide IPv6 services without constructing IPv6
backbone networks. The 6PE solution connects separate IPv6 networks using multiprotocol
label switching (MPLS) tunnels. The 6PE solution implements IPv4/IPv6 dual stack on the
provider edge devices (PEs) of Internet service providers and uses the Multi-protocol
Extensions for Border Gateway Protocol (MP-BGP) to assign labels to IPv6 routes. In this
manner, the 6PE solution connects separate IPv6 networks over IPv4 tunnels between PEs.
Related Concepts
In practical application, different metropolitan area networks (MANs) of a service provider or
collaborative backbone networks of different service providers often span multiple
autonomous systems (ASs). The 6PE solution can be intra-AS 6PE or inter-AS 6PE,
depending on whether separate IPv6 networks connect to the same AS. Standard protocol
provides three inter-AS 6PE modes: inter-AS 6PE OptionB, inter-AS 6PE OptionB with
autonomous system boundary routers (ASBRs) as PEs, and inter-AS OptionC. This section
describes the following 6PE modes:
Intra-AS 6PE: Separate IPv6 networks are connected by the same AS. PEs in the AS
exchange IPv6 routes by establishing MP-IBGP peer relationships.
Inter-AS 6PE OptionB: ASBRs in different ASs exchange labeled IPv6 routes by
establishing MP-EBGP peer relationships.
Inter-AS 6PE OptionB (with ASBRs as PEs): ASBRs in different ASs exchange IPv6
routes using MP-EBGP.
Inter-AS 6PE OptionC: PEs in different ASs exchange labeled IPv6 routes over
multi-hop MP-EBGP peer sessions.
Intra-AS 6PE
Figure 1-877 shows intra-AS 6PE networking. 6PE runs on the edge of a service provider
network. PEs that connect to IPv6 networks are IPv4/IPv6 dual-stack devices. PEs and
customer edge devices (CEs) exchange IPv6 routes using the IPv6 Interior Gateway Protocol
(IGP), or IPv6 External Border Gateway Protocol (EBGP). PEs exchange IPv4 routes with
each other or with provider devices (Ps) using an IPv4 routing protocol. PEs must establish
tunnels to transparently transmit IPv6 packets. PEs often use MPLS label switched paths
(LSPs) and MPLS Local IFNET tunnels. By default, a PE uses an MPLS LSP to transmit IPv6
packets. If no MPLS LSP is available, a PE uses an MPLS Local IFNET tunnel to transmit
IPv6 packets.
Figure 1-878 shows route and packet transmission in an intra-AS 6PE scenario. I-L indicates
an inner label, and O-L indicates an outer label. The outer label directs the packet to the BGP
next hop, and the inner label identifies the outbound interface or CE to which the packet
should be forwarded.
The route transmission process is as follows:
1. CE2 sends an IPv6 route to PE2, its EBGP peer.
2. Upon receipt, PE2 changes the next hop of the IPv6 route to itself and assigns a label to
the IPv6 route. Then, PE2 sends the labeled IPv6 route to PE1, its IBGP peer.
3. Upon receipt, PE1 relays the labeled IPv6 route to a tunnel and adds information about
the route to the local forwarding table. Then, PE1 changes the next hop of the route to
itself, removes the label of the route, and sends the route to CE1, its EBGP peer.
The IPv6 route transmission from CE2 to CE1 is complete.
The packet transmission process is as follows:
1. CE1 sends an ordinary IPv6 packet to PE1 over an IPv6 link on the public network.
2. Upon receipt, PE1 searches its local forwarding table for the forwarding entry based on
the destination address of the packet and encapsulates the packet with inner and outer
labels. Then, PE1 sends the IPv6 packet to PE2 over a public network tunnel.
3. Upon receipt, PE2 removes the inner and outer labels and forwards the IPv6 packet to
CE2 over an IPv6 link.
As a result, the IPv6 packet is transmitted from CE1 to CE2.
The route and packet transmission processes show that whether the public network is an IPv4
or IPv6 network does not matter to the CEs.
Inter-AS 6PE
Inter-AS 6PE OptionB (with ASBRs as PEs)
Figure 1-879 shows inter-AS 6PE OptionB (with ASBRs as PEs) networking. Inter-AS
6PE OptionB (with ASBRs as PEs) is similar to intra-AS 6PE. The only difference is that
in an inter-AS 6PE OptionB scenario in which ASBRs also function as PEs, ASBRs
establish EBGP peer relationships between each other. The route and packet transmission
processes in an inter-AS 6PE OptionB scenario in which ASBRs also function as PEs are
similar to those in an intra-AS 6PE scenario.
Figure 1-879 Networking diagram for inter-AS 6PE OptionB (with ASBRs as PEs)
Figure 1-881 shows route and packet transmission in an inter-AS 6PE OptionB scenario.
I-L indicates an inner label, and O-L indicates an outer label.
Figure 1-881 Route and packet transmission in an inter-AS 6PE OptionB scenario
Two inter-AS 6PE OptionC solutions are available, depending on the establishment methods of
end-to-end LSPs. In an inter-AS 6PE OptionC scenario, PEs establish multi-hop MP-EBGP peer
relationships to exchange labeled IPv6 routes and establish end-to-end BGP LSPs to transmit IPv6
packets. The way in which an end-to-end BGP LSP is established does not matter much to inter-AS 6PE
OptionC and therefore is not described here.
Figure 1-883 shows route and packet transmission in an inter-AS 6PE OptionC scenario.
I-L indicates an inner label, B-L indicates a BGP LSP label, and O-L indicates an outer
label.
Figure 1-883 Route and packet transmission in an inter-AS 6PE OptionC scenario
Usage Scenarios
Each 6PE mode has its advantages and usage scenarios. The intra-AS 6PE mode is best suited
for scenarios in which separate IPv6 networks connect to the same AS. Inter-AS 6PE modes
are best suited for scenarios in which separate IPv6 networks connect to different ASs. Table
1-249 lists the usage scenarios for inter-AS 6PE modes.
Benefits
6PE offers the following benefits:
Easy maintenance: All configurations are performed on PEs and network maintenance is
simple. IPv6 services are carried over IPv4 networks, but the users on IPv6 networks are
unaware of IPv4 networks.
Low network construction costs: Service providers can provide IPv6 services over
existing MPLS networks without upgrading the networks. 6PE devices can provide
multiple types of services, such as IPv6 VPN and IPv4 VPN.
Applications
On the network shown in Figure 1-884, Device A and Device B are directly connected, and
prefix-based ORF is enabled on them; after negotiating the prefix-based ORF capability with
Device B, Device A adds the local prefix-based inbound policy to a Route-Refresh packet and
then sends the Route-Refresh packet to Device B. Device B uses the information in the packet
to work out an outbound policy to advertise routes to Device A.
As shown in Figure 1-885, Device A and Device B are clients of the RR in the domain.
Prefix-based ORF is enabled on all three NEs. After negotiating prefix-based ORF with the
RR, Device A and Device B add the local prefix-based inbound policies Route-Refresh
packets and then send the packets to the RR. Based on the Route-Refresh packets, the RR
uses the information in the Route-Refresh packets to work out the outbound policies to reflect
routes to Device A and Device B.
Usage Scenario
On the network shown in Figure 1-886, Device Y advertises a learned BGP route to Device
X2 and Device X3 in AS 100; Device X2 and Device X3 then advertise the BGP route to
Device X1 through RR. Therefore, Device X1 receives two routes whose next hops are
Device X2 and Device X3 respectively. Then, Device X1 selects a route based on a
configured routing policy. Assume that the route sent by Device X2 (Link A) is preferred. The
route sent by Device X3 (Link B) then functions as a backup link.
If a node along Link A fails or faults occur on Link A, the next hop of the route from Device
X1 to Device X2 becomes unavailable. If BGP Auto FRR is enabled on Device X1, the
forwarding plane then quickly switches to Link B the traffic from Device X1 to Device Y,
which ensures uninterrupted traffic transmission. In addition, Device X1 reselects the route
sent by Device X3 based on the forwarding prefixes and then updates the FIB table.
Usage Scenario
The BGP dynamic update peer-groups feature is applicable to the following scenarios:
Scenario with an international gateway
Scenario with an RR
Scenario where routes received from EBGP peers need to be sent to all IBGP peers
The preceding scenarios have in common that a router needs to send routes to a large number
of BGP peers, most of which share the same configuration. This situation is most evident in
the networking shown in Figure 1-888.
For example, an RR has 100 clients and needs to reflect 100,000 routes to them. If the RR
groups the routes for each peer before sending the routes to 100 clients, the total number of
times that all routes are grouped is 100,000 x 100. After the dynamic update peer-groups
feature is applied, the total number of times that all routes are grouped changes to 100,000 x 1.
The efficiency is 100 times higher than before.
1.10.9.2.18 Active-Route-Advertise
Background
Active-route-advertise allows a device to advertise only optimal routes in an IP routing table.
Active-route-advertise prevents changes to data forwarding paths that may occur after
independent BGP route selection is deployed. Independent BGP route selection enables a
device to advertise an optimal BGP route in a BGP routing table to BGP peers, regardless of
whether this optimal BGP route is optimal in an IP routing table. Therefore, if a device is
upgraded to support independent BGP route selection, previously selected data forwarding
paths may change. If you do not expect the changes, configure active-route-advertise on the
device.
Related Concepts
IP routing table: stores routes that are optimal in each available protocol routing table, selects
optimal routes from the stored routes, and delivers the selected optimal routes to the
forwarding information base (FIB) table.
Independent BGP route selection: enables a device to advertise an optimal BGP route in a
BGP routing table to BGP peers, regardless of whether this optimal BGP route is optimal in
an IP routing table.
Implementation
As shown in Figure 1-890, an RR is deployed between Device A and Device B, an Open
Shortest Path First (OSPF) neighbor relationship is established between Device A and the RR,
an External BGP (EBGP) peer relationship is established between Device B and Device C,
and Device A and Device C are not directly connected.
The route 100.0.0.0/8 is imported to the BGP and OSPF routing tables on Device A, and the
RR learns both the BGP route 100.0.0.0/8 and OSPF route 100.0.0.0/8. By default, an OSPF
route has a higher priority over a BGP route. Therefore, the OSPF route 100.0.0.0/8 is an
optimal route in the IP routing table, and the BGP route 100.0.0.0/8 is inactive in the IP
routing table. Table 1-250 describes the changes after the RR is upgraded to support
independent BGP route selection, including the changes in the optimal route in the BGP
routing table, route advertisement, and data forwarding path.
Optimal route in
The BGP route 100.0.0.0/8 is an optimal route in the BGP routing
the BGP routing
table, but not optimal in the IP routing table.
table
As described in Table 1-250, active-route-advertise ensures that the data forwarding path Link
B is still used after independent BGP route selection is deployed.
Before you configure active-route-advertise, check whether BGP routes are optimal routes in the IP
routing table. If all BGP routes are optimal routes in the IP routing table, data forwarding paths do not
change after active-route-advertise is configured. If only some BGP routes are optimal routes in the IP
routing table, analyze the impacts of active-route-advertise and determine whether to configure this
feature.
Advantages
Active-route-advertise prevents changes to data forwarding paths that may occur after
independent BGP route selection is deployed.
Purpose
2-byte autonomous system (AS) numbers used on networks range from 1 to 65535, and the
available AS numbers are close to exhaustion as networks expand. Therefore, the AS number
range needs to be extended. 4-byte AS numbers ranging from 1 to 4294967295 can address
this problem. New speakers that support 4-byte AS numbers can co-exist with old speakers
that support only 2-byte AS numbers.
Definition
4-byte AS numbers are extended from 2-byte AS numbers. Border Gateway Protocol (BGP)
peers use a new capability code and optional transitive attributes to negotiate the 4-byte AS
number capability and transmit 4-byte AS numbers. This mechanism enables communication
between new speakers and between old speakers and new speakers.
Open capability code (0x41), defined by standard protocols, indicates that the local end
supports 4-byte capability extension.
The following new optional transitive attributes are defined by standard protocols and used to
transmit 4-byte AS numbers in old sessions:
AS4_Path coded 0x11
Related Concepts
New speaker: a peer that supports 4-byte AS numbers
Old speaker: a peer that does not support 4-byte AS numbers
New session: a BGP connection established between new speakers
Old session: a BGP connection established between a new speaker and an old speaker, or
between old speakers
Principles
BGP speakers negotiate capabilities by exchanging Open messages. Figure 1-891 shows the
format of Open messages exchanged between new speakers. The header of a BGP Open
message is fixed, in which My AS Number is supposed to be the local AS number. However,
My AS Number can carry only 2-byte AS numbers. Therefore, a new speaker adds 23456 to
My AS Number and its local AS number to Optional parameters before it sends an Open
message to a peer. After the peer receives the message, it can determine whether the new
speaker supports 4-byte AS numbers by checking Optional parameters in the message.
Figure 1-892 shows how peer relationships are established between new speakers, and
between an old speaker and a new speaker. BGP speakers notify each other of whether they
support 4-byte AS numbers by exchanging Open messages. After the capability negotiation,
new sessions are established between new speakers, and old sessions are established between
a new speaker and an old speaker.
AS_Path and Aggregator in Update messages exchanged between new speakers carry 4-byte
AS numbers, while AS_Path and Aggregator in Update messages sent by an old speaker
carry 2-byte AS numbers.
When a new speaker sends an Update message carrying an AS number greater than
65535 to an old speaker, the new speaker uses AS4_Path and AS4_Aggregator to assist
AS_Path and AS_Aggregator in transferring 4-byte AS numbers. AS4_Path and
AS4_Aggregator are transparent to the old speaker. In the networking shown in Figure
1-893, before the new speaker in AS 2.2 sends an Update message to the old speaker in
AS 65002, the new speaker replaces each 4-byte AS number (2.2, 1.1) with 23456 in
AS_Path. Therefore, the AS_Path carried in the Update message is (23456, 23456,
65001), and the carried AS4_Path is (2.2, 1.1). After the old speaker in AS 65002
receives the Update message, it transparently transmits the message to other ASs.
When the new speaker receives an Update message carrying AS_Path, AS4_Path,
AS_Aggregator, and AS4_Aggregator from the old speaker, the new speaker uses the
reconstruction algorithm to reconstruct the actual AS_Path and AS_Aggregator. In the
networking shown in Figure 1-893, after the new speaker in AS 65003 receives an
Update message carrying AS_Path (65002, 23456, 23456, 65001) and AS4_Path (2.2,
1.1) from the old speaker in AS 65002, the new speaker reconstructs the actual AS_Path
(65002, 2.2, 1.1, 65001).
Adjusting the display format of 4-byte AS numbers affects the matching results of AS_Path
regular expressions and extended community filters. If you adjust the display format of
4-byte AS numbers on a system that uses an AS_Path regular expression or extended
community filter as the export or import policy, reconfigure the AS_Path regular expression
or extended community filter. If you do not reconfigure the AS_Path regular expression or
extended community filter, routes cannot match the export or import policy, and a network
error may occur.
Benefits
4-byte AS numbers alleviate AS number exhaustion and therefore are beneficial to carriers
who need to expand the network scale.
1.10.9.2.20 BMP
Background
The BGP Monitoring Protocol (BMP) is designed to monitor BGP running status, such as
BGP peer relationship establishment and termination and route updates.
Without BMP, manual query is required if you want to know about BGP running status.
With BMP, a router can be connected to a monitoring server and configured to report BGP
running statistics to the server for monitoring, which improves the network monitoring
efficiency.
BMP Messages
Routers send BMP packets carrying Initiation, Peer Up Notification (PU), Route Monitoring
(RM), Peer Down Notification (PD), Status Report (SR), or Termination messages to the
monitoring server to report BGP running statistics. The functions of these messages are listed
as follows:
Initiation message: Reports to the monitoring server such information as the router
vendor and its software version.
PU message: Notifies the monitoring server that a BGP peer relationship has been
established.
RM message: Sends to the monitoring server all routes received from BGP peers and
notifies the server of route addition or deletion in real time.
PD message: Notifies the monitoring server that a BGP peer has been disconnected.
SR message: Reports router running statistics to the monitoring server.
Termination message: Reports to the monitoring server the cause of BMP session
termination.
BMP sessions are unidirectional. Routers send messages to the monitoring server but ignore messages
replied by the server.
Implementation
In Figure 1-894, a TCP connection is established between the monitoring server and PE1 and
between the monitoring server and PE2. PE1 and PE2 send unsolicited BMP packets to the
monitoring server to report BGP running statistics. After receiving these BMP packets, the
monitoring server parses them and displays the BGP running status in the monitoring view.
The BMP packets carry headers. By analyzing the headers, the monitoring server can decide
which BGP peers have advertised the routes carried in these packets.
When establishing a connection between a router and a monitoring server, note the following
rules:
You can specify a port for the TCP connection between the router and the monitoring
server.
One router can connect to multiple monitoring servers, and one monitoring server can
also connect to multiple routers.
In each BMP instance, one router can connect to only one monitoring server.
The monitoring server monitors all BGP peers. Specifying the BGP peer to be monitored
is not supported.
Benefits
BMP facilitates the monitoring of BGP running status and reports security threats in real time
so that preventive measures can be taken promptly.
Background
If multiple routes to the same destination are available, a BGP device selects one optimal
route based on BGP route selection policies and advertises the route to its BGP peers.
For details about BGP route selection policies, see BGP Principles.
However, in scenarios with master and backup provider edges (PEs) or route reflectors (RRs),
if routes are selected based on the preceding policies and the primary link fails, the BGP route
convergence takes a long time because no backup route is available. To address this problem,
the BGP Best External feature was introduced.
Related Concepts
BGP Best External: A mechanism that enables a backup device to select a sub-optimal route
and send the route to its BGP peers if the route preferentially selected based on BGP route
selection policies is an Internal Border Gateway Protocol (IBGP) route advertised by the
master device. Therefore, BGP Best External speeds up BGP route convergence if the primary
link fails.
Best External route: The sub-optimal route selected after BGP Best External is enabled.
BGP Best External can be enabled on PE2 to address this problem. With BGP Best External,
PE2 selects the EBGP route from CE1 and advertises it to PE3. Therefore, PE3 has two routes
to 1.1.1.1/32, in which the route CE1 -> PE2 -> PE3 backs up CE1 -> PE1 -> PE3. Table
1-251 lists the differences with and without BGP Best External.
If the Link
Route Available
BGP Best External Optimal Route Between CE1 and
on PE3
PE1 Fails
Not enabled CE1 -> PE1 -> PE3 CE1 -> PE1 -> PE3 A new route must be
If the Link
Route Available
BGP Best External Optimal Route Between CE1 and
on PE3
PE1 Fails
selected to take over
traffic after routes
are converged.
Enabled CE1 -> PE1 -> PE3 Traffic is switched
CE1 -> PE1 -> PE3 to CE1 -> PE2 ->
CE1 -> PE2 -> PE3 PE3 immediately.
BGP Best External can be enabled on RR2 to address this problem. With BGP Best External,
RR2 selects the EBGP route from Device B and advertises it to Device C. Therefore, Device
C has two routes to 1.1.1.1/32, in which the route Device A -> Device B -> RR2 -> Device C
backs up Device A -> Device B -> RR1 -> Device C. Table 1-252 lists the differences with
and without BGP Best External.
If the Link
Route Available
BGP Best External Optimal Route Between Device B
on Device C
and RR1 Fails
A new route must be
Device A -> Device Device A -> Device
selected to take over
Not enabled B -> RR1 -> Device B -> RR1 -> Device
traffic after routes
C C
are converged.
Device A -> Device
B -> RR1 -> Device Traffic is switched
C Device A -> Device to Device A ->
Enabled B -> RR1 -> Device Device B -> RR2 ->
Device A -> Device C Device C
B -> RR2 -> Device immediately.
C
Usage Scenario
The BGP Best External feature applies to scenarios in which master and backup PEs or RRs
are deployed and the backup PE or RR needs to advertise the sub-optimal route (Best External
route) to its BGP peers to speed up BGP route convergence.
Advantages
As networks develop, services, such as voice over IP (VoIP), online video, and financial
services, pose higher requirements for real-time transmission. With BGP Best External, the
backup device selects the sub-optimal route and advertises the route to its BGP peers, which
speeds up BGP route convergence and minimizes service interruptions.
Background
In a scenario with a route reflector (RR) and clients, if the RR has multiple routes to the same
destination (with the same prefix), the RR selects an optimal route from these routes and then
sends only the optimal route to its clients. Therefore, the clients have only one route to the
destination. If a link along this route fails, route convergence takes a long time, which cannot
meet the requirements on high reliability.
To address this issue, deploy the BGP Add-Path feature on the RR. With BGP Add-Path, the
RR can send two or more routes with the same prefix to its clients. After reaching the clients,
these routes can back up each other or load-balance traffic, which ensures high reliability in
data transmission.
For details about route selection and advertisement policies, see 1.10.9.2.1 Basic Principle.
BGP Add-Path is deployed on RRs in most cases although it can be configured on any router.
With BGP Add-Path, you can configure the maximum number of routes with the same prefix that an
RR can send to its clients. The actual number of routes with the same prefix that an RR can send to
its clients is the smaller value between the configured maximum number and the number of
available routes with the same prefix.
Related Concepts
Add-Path route: The routes selected by BGP after BGP Add-Path is configured.
Typical Networking
On the network shown in Figure 1-897, Device A, Device B, and Device C are clients of the
RR, and Device D is an EBGP peer of Device B and Device C.
Each of Device B and Device C receives a route to 1.1.1.1/32 from Device D, with 9.1.1.1/24
and 9.1.2.1/24 as the next hops, respectively. Then, each of Device B and Device C sends the
received route to the RR. After receiving the two routes, the RR selects an optimal route from
them and sends it to Device A. Therefore, Device A has only one route to 1.1.1.1/32.
BGP Add-Path can be configured to allow the RR to send more than one route with the same
prefix to Device A. Suppose that the configured maximum number of routes with the same
prefix that the RR can send to Device A is 2 and that the optimal route selected by the RR is
the one from Device B. Table 1-253 lists the differences with and without BGP Add-Path.
Usage Scenario
The BGP Add-Path feature applies to scenarios in which an RR and clients are deployed and
the RR needs to send more than one route with the same prefix to its clients to ensure high
reliability in data transmission.
BGP Add-Path is used in traffic optimization scenarios and allows multiple routes to be sent
to the controller.
Benefits
Deploying BGP Add-Path can improve network reliability.
Background
In some scenarios, if a large number of routes are iterated to the same next hop that flaps
frequently, the system will be busy processing reselection and re-advertisement of these routes,
which consumes excessive resources and leads to high CPU usage. BGP iteration suppression
in case of next hop flapping can address this problem.
Principles
After this function is enabled, BGP calculates the penalty value that starts from 0 by
comparing the flapping interval with configured intervals if next hop flapping occurs. When
the penalty value exceeds 10, BGP suppresses route iteration to the corresponding next hop.
For example, if the intervals for increasing, retaining, and clearing the penalty value are T1,
T2, and T3, respectively, BGP calculates the penalty value as follows:
Increases the penalty value by 1 if the flapping interval is less than T1.
Retains the penalty value if the flapping interval is greater than or equal to T1, but less
than T2.
Reduces the penalty value by 1 if the flapping interval is greater than or equal to T2, but
less than T3.
Clears the penalty value if the flapping interval is greater than or equal to T3.
When the penalty value exceeds 10, the system processes reselection and re-advertisement of
the routes that are iterated to a flapping next hop much slower.
Benefits
BGP iteration suppression in case of next hop flapping prevents the system from frequently
processing reselection and re-advertisement of a large number of routes that are iterated to a
flapping next hop, which reduces system resource consumption and CPU usage.
1.10.9.2.24 BGP-LS
BGP-link state (LS) enables BGP to report topology information collected by IGPs to the
controller.
Background
BGP-LS is a new method of collecting topology information.
Without BGP-LS, the router uses an IGP (OSPF or IS-IS) to collect topology information of
each AS, and the IGP reports the information to the controller. This topology information
collection method has the following disadvantages:
The controller must have high computing capabilities and support the IGP and its
algorithm.
The controller cannot gain the complete inter-AS topology information and therefore is
unable to calculate optimal E2E paths.
Different IGPs report topology information separately to the controller, which
complicates the controller's analysis and processing.
For details on how OSPF collects topology information, see NE20E Feature Description -
OSPF.
For details on how IS-IS collects topology information, see NE20E Feature Description -
IS-IS.
With powerful routing capabilities of BGP, BGP-LS has the following advantages:
Reduces computing capability requirements and spares the necessity of IGPs on the
controller.
Facilitates route selection and calculation on the controller by using BGP to summarize
process or AS topology information and report the complete information to the
controller.
Requires only one routing protocol (BGP) to report topology information to the
controller.
Related Concepts
BGP-LS provides a simple and efficient method of collecting topology information.
BGP-LS routes carry topology information and are classified into three types of routes that
carry node, link, and route prefix information, respectively. Theses routes collaborate in
carrying topology information.
Item Description
NODE Field indicating that the BGP-LS route is a
node route.
ISIS-LEVEL-1 Protocol that collects topology information.
The protocol is IS-IS in this example.
IDENTIFIER0 Identifier of the protocol that collects
topology information.
LOCAL Field indicating information of a local node.
as BGP-LS domain AS number.
bgp-ls-identifier BGP-LS domain ID.
ospf-area-id OSPF area ID.
igp-router-id IGP router ID, generated by the IGP that
collects topology information. The router ID
is obtained from the NET of an IS-IS
process in this example.
Item Description
LINK Field indicating that the BGP-LS route is a
link route.
ISIS-LEVEL-1 Protocol that collects topology information.
The protocol is IS-IS in this example.
IDENTIFIER0 Identifier of the protocol that collects
topology information.
LOCAL Field indicating information of a local node.
as BGP-LS domain AS number.
bgp-ls-identifier BGP-LS domain ID.
ospf-area-id OSPF area ID.
Item Description
igp-router-id IGP router ID, generated by the IGP that
collects topology information. The router ID
is obtained from the NET of an IS-IS
process in this example.
REMOTE Field indicating information of a remote
node.
if-address IP address of the local interface.
peer-address IP address of the remote interface.
mt-id ID of the topology.
Item Description
Item Description
mt-id ID of the topology.
ospf-route-type OSPF route type:
1: Intra-Area
2: Inter-Area
3: External 1
4: External 2
5: NSSA 1
6: NSSA 2
Typical Networking
Networking in which topology information is collected within an IGP area
In Figure 1-898, Device A, Device B, Device C, and Device D use IS-IS to communicate with
each other at the network layer. They are all Level-2 devices in the same area (area 10). Only
one of the four devices needs to have BGP-LS deployed and establish a BGP-LS peer
relationship with the controller so that BGP-LS can collect and report topology information to
the controller. To improve reliability, deploying BGP-LS on two or more devices and
establishing a BGP-LS peer relationship between each BGP-LS device and the controller are
recommended. The BGP-LS devices collect the same topology information, and they back up
each other in case one of them fails.
Figure 1-898 Networking in which topology information is collected within an IGP area
In Figure 1-899, Device A, Device B, Device C, and Device D use IS-IS to communicate with
each other at the network layer. Device A, Device B, and Device C reside in area 10, whereas
Device D resides in area 20. Device A and Device B are Level-1 devices, Device C is a
Level-1-2 device, and Device D is a Level-2 device. Only one of the four devices needs to
have BGP-LS deployed and establish a BGP-LS peer relationship with the controller so that
BGP-LS can collect and report topology information to the controller. To improve reliability,
deploying BGP-LS on two or more devices and establishing a BGP-LS peer relationship
between each BGP-LS device and the controller are recommended. The BGP-LS devices
collect the same topology information, and they back up each other in case one of them fails.
Figure 1-899 Networking in which topology information is collected between IGP areas
Figure 1-900 Networking 1 in which topology information is collected between BGP ASs
If two controllers are deployed and are connected to different ASs, for example in Figure
1-901, a BGP-LS peer relationship must be established between the two controllers or
between Device B and Device C so that both controllers can obtain topology information on
the whole network.
Figure 1-901 Networking 2 in which topology information is collected between BGP ASs
To reduce the number of connections to the controller, deploy one or more BGP-LS RRs and establish
BGP-LS peer relationships between each RR and the devices that require BGP-LS peer relationships
with the controller.
Usage Scenario
The router functions as a forwarder and reports topology information to the controller for
topology monitoring and traffic control.
Benefits
BGP-LS offers the following benefits:
Reduces computing capability requirements of the controller.
Allows the controller to gain the complete inter-AS topology information.
Requires only one routing protocol (BGP) to report topology information to the
controller.
Definition
Routing policies are used to filter routes and control how routes are received and advertised.
If route attributes, such as reachability, are changed, the path along which network traffic
passes changes accordingly.
Purpose
When advertising, receiving, and importing routes, the router implements certain routing
policies based on actual networking requirements to filter routes and change the route
attributes. Routing policies serve the following purposes:
Control route advertising
Only routes that match the rules specified in a policy are advertised.
Control route receiving
Only the required and valid routes are received, which reduces the routing table size and
improves network security.
Filter and control imported routes
A routing protocol may import routes discovered by other routing protocols. Only routes
that satisfy certain conditions are imported to meet the requirements of the protocol.
Modify attributes of specified routes
To enrich routing information, a routing protocol may import routing information
discovered by other routing protocols. Only the routing information that satisfies the
conditions is imported. Some attributes of the imported routing information are changed
to meet the requirements of the routing protocol.
Benefits
Routing policies have the following benefits:
Control the routing table size, saving system resources.
Control route receiving and advertising, improving network security.
Modify attributes of routes for proper traffic planning, improving network performance.
user-defined routing policies. PBR selects routes based on the user-defined routing policies,
with reference to the source IP addresses and lengths of incoming packets. PBR can be used
to improve security and implement load balancing.
A routing policy and PBR have different mechanisms. Table 1-257 shows the differences
between them.
1.10.10.2 Principles
Implementation
Routing policies are implemented in the following steps:
1. Define rules. Characteristics of routing information to which routing policies are applied
need to be defined. Specifically, you need to define a set of matching rules regarding
different attributes of routing information, such as the destination address and AS
number.
2. Apply rules. Matching rules are applied to advertise, receive, or import routes.
Filter
A filter is the core of a routing policy and is defined using a set of matching rules. The NE20E
provides the filters listed in Table 1-258.
The ACL, IP prefix list, AS_Path, community, Extended community, and RD filters can be
used to filter routes but cannot modify route attributes. A route-policy is a comprehensive
filter and can use the matching rules of the ACL, IP prefix list, AS_Path, community,
Extended community, and RD filters to filter routes and change route attributes.
ACL
An ACL is a set of sequential filtering rules. Users can define rules based on packet
information, such as inbound interfaces, source or destination IP addresses, protocol types, or
source or destination port numbers and specify an action to deny or permit packets. After an
ACL is configured, the system classifies received packets based on the rules defined in the
ACL and denies or permits the packets accordingly.
An ACL only classifies packets based on defined rules and filters packets only after it is
applied to a routing policy.
ACLs can be configured for both IPv4 packets and IPv6 packets. Based on the usage, ACLs
are classified as interface-based ACLs, basic ACLs, or advanced ACLs. Users can specify the
IP address and subnet address range in an ACL to match the source IP address, destination
network segment address, or the next hop address of a route.
ACLs can be configured on access or core devices to:
Protect the devices against IP, TCP, and Internet Control Message Protocol (ICMP)
packet attacks.
Control network access. For example, ACLs can be used to control the access of
enterprise network users to external networks, the specific network resources that users
can access, and the period for which users can access networks.
Limit network traffic and improve network performance. For example, ACLs can be
used to limit bandwidth for upstream and downstream traffic, charge for the bandwidth
that users have applied for, and fully use high-bandwidth network resources.
For details about ACL features, see 1.9.3 ACL.
IP Prefix List
An IP prefix list contains a group of route filtering rules. Users can specify the prefix and
mask length range to match the destination network segment address or the next hop address
of a route. An IP prefix list is used to filter routes that are advertised and received by various
dynamic routing protocols.
An IP prefix list is easier and more flexible than an ACL. However, if a large number of
routes with different prefixes need to be filtered, configuring an IP prefix list to filter the
routes is complex.
IP prefix lists can be configured for both IPv4 routes and IPv6 routes, and they share the same
implementation process. An IP prefix list filters routes based on the mask length or mask
length range.
Mask length: An IP prefix list filters routes based on IP address prefixes. An IP address
prefix is defined by an IP address and the mask length. For example, for route
10.1.1.1/16, the mask length is 16 bits, and the valid prefix is 16 bits (10.1.0.0).
Mask length range: Routes with the IP address prefix and mask length within the range
defined in the IP prefix list meet the matching rules.
0.0.0.0 is a wildcard address. If the IP prefix is 0.0.0.0, specify either a mask or a mask length range,
with the following results:
If a mask is specified, all routes with the mask are permitted or denied.
If a mask length range is specified, all routes with the mask length in the range are permitted or
denied.
AS_Path
An AS_Path is used to filter BGP routes based on AS_Path attributes contained in BGP routes.
The AS_Path attribute is used to record in distance-vector (DV) order the numbers of all ASs
through which a BGP route passes from the local end to the destination. Therefore, AS_Path
attributes can be used to filter BGP routes.
The matching condition of an AS_Path is specified using a regular expression. For example,
^30 indicates that only the AS_Path attribute starting with 30 is matched. Using a regular
expression can simplify configurations. For details about regular expressions, see
Configuration Guide - Basic Configurations.
The AS_Path attribute is a private attribute of BGP and is therefore used to filter BGP routes only. For
details about the AS_Path attribute, see 1.10.9.2.1 Basic Principle.
Community
A community is used to filter BGP routes based on the community attributes contained in
BGP routes. The community attribute is a set of destination addresses with the same
characteristics. Therefore, community attributes can be used to filter BGP routes.
In addition to the well-known community attributes, users can define community attributes
using digits. The matching condition of a community filter can be specified using a
community ID or a regular expression.
Like AS_Path filters, community filters are used to filter only BGP routes because the community
attribute is also a private attribute of BGP. For details about the community attribute, see 1.10.9.2.6
Community Attribute.
Extended Community
An extended community is used to filter BGP routes based on extended community attributes.
BGP extended community attributes are classified into two types:
VPN target: A VPN target controls route learning between VPN instances, isolating
routes of VPN instances from each other. A VPN target may be either an import or export
VPN target. Before advertising a VPNv4 or VPNv6 route to a remote MP-BGP peer, a
PE adds an export VPN target to the route. After receiving a VPNv4 or VPNv6 route, the
remote MP-BGP peer compares the received export VPN target with the local import
VPN target. If they are the same, the remote MP-BGP peer adds the route to the routing
table of the local VPN instance.
Source of Origin (SoO): Several CEs at a VPN site may be connected to different PEs.
The VPN routes advertised from the CEs to the PEs may be re-advertised to the VPN site
where the CEs reside after the routes have traversed the backbone network, causing
routing loops at the VPN site. In this situation, configure an SoO attribute for VPN
routes. With the SoO attribute, routes advertised from different VPN sites can be
distinguished and will not be advertised to the source VPN site, preventing routing loops.
The formats of a VPN target attribute and an SoO attribute are the same. The matching
condition of an extended community can be specified using an extended community ID or a
regular expression.
An extended community is used to filter only BGP routes because the extended community attribute is
also a private attribute of BGP. For details about the extended community attribute, see 1.14.6.2.1 Basic
BGP/MPLS IP VPN.
RD
An RD is used to filter BGP routes based on RDs in VPN routes. RDs are used to distinguish
IPv4 and IPv6 prefixes in the same address segment in VPN instances. An RD filters specify
matching rules regarding RD attributes.
For details about how to configure an RD, see HUAWEI NE20E-S2 Universal Service
Router Configuration Guide – VPN.
Route-Policy
A route-policy is a comprehensive filter. It is used to match attributes of specified routes and
change route attributes when specific conditions are met. A route-policy can use the preceding
six filters to define its matching rules.
Composition of a Route-Policy
As shown in Figure 1-902, a route-policy consists of node IDs, matching mode, if-match
clauses, and apply clauses.
− Node ID
A route-policy consists of one or more nodes. Node IDs are specified as indexes in
the IP prefix list. In a route-policy, routes are filtered based on the following rules:
Sequential matching: The system checks entries based on node IDs in
ascending order. Therefore, specifying the node IDs in the required sequence is
recommended.
One-time matching: The relationship between the nodes of a route-policy is
"OR". If a route matches one node, the route matches the route-policy and will
not be matched against the next node.
− Matching mode
Either of the following matching modes can be used:
permit: specifies the permit mode of a node. If a route matches the if-match
clauses of a node, all the actions defined by apply clauses are performed, and
the matching is complete. If a route does not match the if-match clauses of the
node, the route continues to match against the next node.
deny: specifies the deny mode of a node. In deny mode, the apply clauses are
not used. If a route matches all the if-match clauses of the node, the route is
denied by the node and no longer matches against the next node. If the route
does not match any of the if-match clauses, the route continues to match
against the next node.
To allow other routes to pass through, a route-policy that contains no if-match or apply clause in
permit mode needs to be configured for a node next to multiple nodes in deny mode.
− if-match clause
The if-match clause defines the matching rules.
Each node of a route-policy can comprise multiple if-match clauses or no if-match
clause at all. If no if-match clause is configured for a node in permit mode, all
IPv4 and IPv6 routes match the node. If an if-match clause is configured to match
only IPv4 routes for a node in permit mode, matching IPv4 routes and all IPv6
routes match the node. If an if-match clause is configured to match only IPv6
routes for a node in permit mode, matching IPv6 routes and all IPv4 routes match
the node.
− apply clause
The apply clauses specify actions. When a route matches a route-policy, the system
sets some attributes for the route based on the apply clause.
Each node of a route-policy can comprise multiple apply clauses or no apply clause
at all. The apply clause is not used when routes need to be filtered but attributes of
the routes do not need to be changed.
Matching results of a route-policy
The matching results of a route-policy are obtained based on the following aspects:
− Matching mode of the node, either permit or deny
− Matching rules (either permit or deny) contained in the if-match clause (such as
ACLs or IP prefix lists)
The matching results are listed in Table 1-259.
On the HUAWEI NE20E-S2, all routes that fail to match a route-policy are denied by the route-policy
by default. If more than one node is defined in a route-policy, at least one of them must be in permit
mode. The reason is as follows:
If a route fails to match any of the nodes, the route is denied by the route-policy.
If all the nodes in the route-policy are set in deny mode, all the routes to be filtered are denied by the
route-policy.
Other Functions
In addition to the preceding functions, routing policies have an enhanced feature: BGP to IGP.
In some scenarios, when an IGP uses a routing policy to import BGP routes, route attributes,
the cost for example, can be set based on private attributes, such as the community in BGP
routes. However, without the BGP to IGP feature, BGP routes are denied because the IGP
fails to identify private attributes, such as community attributes in these routes. As a result,
apply clauses used to set route attributes do not take effect.
With the BGP to IGP feature, route attributes can be set based on private attributes, such as
the community, extended community, and AS_Path attributes in BGP routes. The BGP to IGP
implementation process is as follows:
When an IGP imports BGP routes through a route-policy, route attributes can be set
based on private attributes, such as the community attribute in BGP routes.
If BGP routes carry private attributes, such as community attributes, the system filters
the BGP routes based on the private attributes. If the BGP routes meet the matching rules,
the routes match the route-policy, and apply clauses take effect.
If BGP routes do not carry private attributes, such as community attributes, the BGP
routes fail to match the route-policy and are denied, and apply clauses do not take effect.
1.10.10.3 Applications
There are multiple approaches to meet the preceding requirements, and the following two
approaches are used in this example:
Use IP prefix lists
− Configure an IP prefix list for Device A and configure the IP prefix list as an export
policy on Device A for OSPF.
− Configure another IP prefix list for Device C and configure the IP prefix list as an
import policy on Device C for OSPF.
Use route-policies
− Configure a route-policy (the matching rules can be the IP prefix list, cost, or route
tag) for Device A and configure the route-policy as an export policy on Device A for
OSPF.
− Configure another route-policy on Device C and configure the route-policy as an
import policy on Device C for OSPF.
Compared with an IP prefix list, a route-policy can change route attributes and control
routes more flexibly, but it is more complex to configure.
To meet the preceding requirements, configure a route-policy for Device A to set a tag for the
imported IS-IS routes. Device D identifies the IS-IS routes from OSPF routes based on the
tag.
To establish an inter-AS label switched path (LSP) between PE1 and PE2, route-policies need
to be configured for autonomous system boundary routers (ASBRs).
When an ASBR advertises the routes received from a PE in the same AS to the peer
ASBR, the ASBR allocates MPLS labels to the routes using a route-policy.
When an ASBR advertises labeled IPv4 routes to a PE in the same AS, the ASBR
reallocates MPLS labels to the routes using another route-policy.
In addition, to control route transmission between different VPN instances on a PE, configure
a route-policy for the PE and configure the route-policy as an import or export policy for the
VPN instances.
To enable devices on the MAN to access the backbone network, Device C and Device D need
to import routes. When OSPF imports BGP routes, a routing policy can be configured to
control the number of imported routes based on private attributes (such as the community) of
the imported BGP routes or modify the cost of the imported routes to control the MAN egress
traffic.
1.11 IP Multicast
1.11.1 About This Document
Purpose
This document describes the IP multicast feature in terms of its overview, principles, and
applications.
Related Version
The following table lists the product version related to this document.
Intended Audience
This document is intended for:
Network planning engineers
Commissioning engineers
Data configuration engineers
System maintenance engineers
Security Declaration
Encryption algorithm declaration
The encryption algorithms DES/3DES/SKIPJACK/RC2/RSA (RSA-1024 or
lower)/MD2/MD4/MD5 (in digital signature scenarios and password encryption)/SHA1
(in digital signature scenarios) have a low security, which may bring security risks. If
protocols allowed, using more secure encryption algorithms, such as AES/RSA
(RSA-2048 or higher)/SHA2/HMAC-SHA2 is recommended.
Password configuration declaration
− Do not set both the start and end characters of a password to "%^%#". This causes
the password to be displayed directly in the configuration file.
− To further improve device security, periodically change the password.
Personal data declaration
Your purchased products, services, or features may use users' some personal data during
service operation or fault locating. You must define user privacy policies in compliance
with local laws and take proper measures to fully protect personal data.
Feature declaration
− The NetStream feature may be used to analyze the communication information of
terminal customers for network traffic statistics and management purposes. Before
enabling the NetStream feature, ensure that it is performed within the boundaries
permitted by applicable laws and regulations. Effective measures must be taken to
ensure that information is securely protected.
− The mirroring feature may be used to analyze the communication information of
terminal customers for a maintenance purpose. Before enabling the mirroring
function, ensure that it is performed within the boundaries permitted by applicable
laws and regulations. Effective measures must be taken to ensure that information is
securely protected.
− The packet header obtaining feature may be used to collect or store some
communication information about specific customers for transmission fault and
error detection purposes. Huawei cannot offer services to collect or store this
information unilaterally. Before enabling the function, ensure that it is performed
within the boundaries permitted by applicable laws and regulations. Effective
measures must be taken to ensure that information is securely protected.
Reliability design declaration
Network planning and site design must comply with reliability design principles and
provide device- and solution-level protection. Device-level protection includes planning
principles of dual-network and inter-board dual-link to avoid single point or single link
of failure. Solution-level protection refers to a fast convergence mechanism, such as FRR
and VRRP.
Special Declaration
This document serves only as a guide. The content is written based on device
information gathered under lab conditions. The content provided by this document is
intended to be taken as general guidance, and does not cover all scenarios. The content
provided by this document may be different from the information on user device
interfaces due to factors such as version upgrades and differences in device models,
board restrictions, and configuration files. The actual user device information takes
precedence over the content provided by this document. The preceding differences are
beyond the scope of this document.
The maximum values provided in this document are obtained in specific lab
environments (for example, only a certain type of board or protocol is configured on a
tested device). The actually obtained maximum values may be different from the
maximum values provided in this document due to factors such as differences in
hardware configurations and carried services.
Interface numbers used in this document are examples. Use the existing interface
numbers on devices for configuration.
The pictures of hardware in this document are for reference only.
Symbol Conventions
The symbols that may be found in this document are defined as follows.
Symbol Description
Indicates an imminently hazardous situation which, if not
avoided, will result in death or serious injury.
Change History
Updates between document issues are cumulative. Therefore, the latest document issue
contains all updates made in previous issues.
Changes in Issue 03 (2017-09-20)
This issue is the third official release. The software version of this issue is
V800R009C10SPC200.
Changes in Issue 02 (2017-07-30)
This issue is the second official release. The software version of this issue is
V800R009C10SPC100.
Changes in Issue 01 (2017-05-30)
This issue is the first official release. The software version of this issue is
V800R009C10.
IP Transmission Modes
Based on the IP address types, networks can transmit packets in the following modes:
IP unicast mode
IP broadcast mode
IP multicast mode
Any of these modes can be used for P2MP data transmission.
Unicast transmission
− Features: A unicast packet uses a unicast address as the destination address. If
multiple receivers require the same packet from a source, the source sends an
individual unicast packet to each receiver.
− Disadvantages: This mode consumes unnecessary bandwidth and processor
resources when sending the same packet to a large number of receivers.
Additionally, the unicast transmission mode does not guarantee transmission quality
when a large number of hosts exist.
Broadcast transmission
− Features: A broadcast packet uses a broadcast address as the destination address. In
this mode, a source sends only one copy of each packet to all hosts on the network
segment, irrespective of whether a host requires the packet.
− Disadvantages: This mode requires that the source and receivers reside on the same
network segment. Because all hosts on the network segment receive packets sent by
the source, this mode cannot guarantee information security or charging of services.
Multicast transmission
As shown in Figure 1-907, a source exists on the network. User A and User C require
information from the source, while User B does not. The transmission mode is multicast.
− Advantages: In multicast mode, a single information flow is sent to users along the
distribution tree, and a maximum of one copy of the data flow exists on each link.
Users who do not require the packet do not receive the packet, providing the basis
for information security. Compared with unicast, multicast does not increase the
network load when the number of users increases in the same multicast group. This
advantage prevents the server and CPU from being overloaded. Compared with
broadcast, multicast can transmit information across network segments and across
long distances.
Multicast technologies therefore provide the ideal solution when one source must
address multiple receivers with efficient P2MP data transmission.
− Multicast applications: Multicast applies to all P2MP applications, such as
multimedia presentations, streaming media, communications for training and
tele-learning, highly reliable data storage, and finance (stock-trading) applications.
IP multicast is being widely used in Internet services, such as online broadcast,
network TV, distance learning, remote medicine, network TV broadcast, and
real-time video and audio conferencing.
1.11.2.2 Principles
1.11.2.2.1 Basic Concepts
Multicast Group
A multicast group consists of a group of receivers that require the same data stream. A
multicast group uses an IP multicast address identifier. A host that joins a multicast group
becomes a member of the group and can identify and receive IP packets that have the IP
multicast address as the destination address.
Multicast Source
A multicast source sends IP packets that carry multicast destination addresses.
A multicast source can simultaneously send data to multiple multicast groups.
Multiple multicast sources can simultaneously send data to a same multicast group.
Multicast Router
A router that supports the multicast feature is called a multicast router.
A multicast router implements the following functions:
Manages group members on the leaf segment networks that connect to users.
Routes and forwards multicast packets.
IP multicast is an end-to-end service. Figure 1-908 shows the four IP multicast functions from
the lower protocol layer to the upper protocol layer.
A permanent multicast group address, also known as a reserved multicast group address,
identifies all devices in a multicast group that may contain any number (including 0) of
members. For details, see Table 1-262.
A temporary multicast group address, also known as a common group address, is an IPv4
address that is assigned to a multicast group temporarily. If there is no user in this group,
this address is reclaimed.
Scope Description
FF0x::/32 Well-known multicast addresses defined by the IANA
For details, see Table 1-264.
FF1x::/32 (x cannot be 1 or 2) ASM addresses valid on the entire network
FF2x::/32 (x cannot be 1 or 2)
Scope Description
FF3x::/32 (x cannot be 1 or 2) SSM addresses
This is the default SSM group address scope and is
valid on the entire network.
Figure 1-910 Mapping relationships between multicast IPv4 addresses and multicast MAC
addresses
The first four bits of an IPv4 multicast address, 1110, are mapped to the 25 most significant
bits of a multicast MAC address. In the last 28 bits, only 23 bits are mapped to a MAC
address, resulting in the loss of 5 bits. Therefore, 32 IPv4 multicast addresses are mapped to
the same MAC address.
The IANA defines that the higher-order 16 bits of an IPv6 MAC address are 0x3333, and the
low-order 32 bits of an IPv6 MAC address are the same as those of a multicast IPv6 address.
Figure 1-911 shows the mapping relationship between the multicast IPv6 address and
multicast IPv6 MAC address.
Figure 1-911 Mapping relationships between multicast IPv6 addresses and multicast MAC
addresses
This document focuses on IP multicast technology and device operation. Multicast in the document
refers to IP multicast, unless otherwise specified.
The NE20E supports various multicast routing protocols to implement different applications.
Table 1-265 describes commonly used multicast routing protocols.
Multicast protocols have two main types of functions: managing member relationships;
establishing and maintaining multicast routes.
ASM Model
In the any-source multicast (ASM) model, any sender can act as a multicast source and send
information to a multicast group address. Receivers cannot know the multicast source location
before they join a multicast group.
SFM Model
From the sender's point of view, the source-filtered multicast (SFM) model works the same as
the ASM model. That is, any sender can act as a multicast source and send information to a
multicast group address.
Compared with the ASM model, the SFM model extends the following function: The upper
layer software checks the source addresses of received multicast packets, permitting or
denying packets of multicast sources as configured.
Compared with ASM, SFM adds multicast source filtering policies. The basic principles and
configurations of ASM and SFM are the same. In this document, information about ASM also applies to
SFM.
SSM Model
In real-world situations, users may not require all data sent by multicast sources. The
source-specific multicast (SSM) model allows users to specify multicast data sources.
Compared with receivers in the ASM model, receivers in the SSM model know the multicast
source location before they join a multicast group. The SSM model uses a different address
scope from the ASM model and sets up a dedicated forwarding path between a source and
receivers.
1.11.2.3 Applications
On this network:
P belongs to the public network. Each customer edge (CE) device belongs to a VPN.
Each router is dedicated to a network and maintains only one forwarding mechanism.
PEs are connected to both the public network and one or more VPN networks. The
network information must be completely separated, and a separate set of forwarding
mechanism needs to be maintained for each network. The set of software and hardware
device that serves the same network on the PE is called an instance. A PE supports
multiple instances, and one instance can reside on multiple PEs.
For details of the multi-instance multicast technique, see the HUAWEI NE20E-S2 Universal Service
Router Feature Description - VPN.
1.11.3 IGMP
1.11.3.1 Introduction
Definition
In the TCP/IP protocol suite, the Internet Group Management Protocol (IGMP) manages IPv4
multicast members, and sets up and maintains multicast member relationships between IP
hosts and their directly connected multicast routers.
After IGMP is configured on hosts and their directly connected multicast routers, the hosts
can dynamically join multicast groups, and the multicast routers can manage multicast group
members on the local network.
IGMP implements the following functions on the host side and router side:
On the host side, IGMP allows hosts to dynamically join and leave multicast groups
anytime and anywhere. IGMP does not limit the number of hosts that can join or leave a
multicast group.
A host's operating system (OS) determines the IGMP version that the host supports.
On the router side, IGMP enables a router to determine whether multicast receivers of a
specific group exist. Each host stores information about only the multicast groups it
joins.
IGMP has three versions, as listed in Table 1-266:
Purpose
IGMP allows receivers to access IP multicast networks, join multicast groups, and receive
multicast data from multicast sources. IGMP manages multicast group members by
exchanging IGMP messages between hosts and routers. IGMP records host join and leave
information on interfaces, ensuring correct multicast data forwarding on the interfaces.
1.11.3.2 Principles
1.11.3.2.1 Principles of IGMP
IGMP Messages
IGMPv2 and IGMPv3 support leave messages, but IGMPv1 does not.
IGMPv1 does not support querier election. An IGMPv1 querier is designated by the upper-layer protocol,
such as PIM. In this version, querier election can be implemented only among multicast devices that run
the same IGMP version on a network segment.
IGMP Implementation
IGMP enables a multicast router to identify receivers by sending IGMP Query messages to
hosts and receiving IGMP Report messages and Leave messages from hosts. A multicast
router forwards multicast data to a network segment only if the network segment has
multicast group members. Hosts can decide whether to join or leave a multicast group.
As shown in Figure 1-916, IGMP-enabled Device A functions as a querier to periodically send
IGMP Query messages. All hosts (Host A, Host B, and Host C) on the same network segment
of Device A can receive these IGMP Query messages.
When a host (for example, Host A) receives an IGMP Query message of a multicast
group G, the processing flow is as follows:
− If Host A is already a member of group G, Host A replies with an IGMP Report
message of group G at a random time within the response period specified by
Device A.
After receiving the IGMP Report message, Device A records information about
group G and forwards the multicast data to the network segment of the host
interface that is directly connected to Device A. Meanwhile, Device A starts a timer
for group G or resets the timer if it has been started. If no members of group G
respond to Device A within the interval specified by the timer, Device A stops
forwarding the multicast data of group G.
− If Host A is not a member of any multicast group, Host A does not respond to the
IGMP Query message from Device A.
When a host (for example, Host A) joins a multicast group G, the processing flow is as
follows:
Host A sends an IGMP Report message of group G to Device A, instructing Device A to
update its multicast group information. Subsequent IGMP Report messages of group G
are triggered by IGMP Query messages sent by Device A.
When a host (for example, Host A) leaves a multicast group G, the processing flow is as
follows:
Host A sends an IGMP Leave message of group G to Device A. After receiving the
IGMP Leave message, Device A triggers a query to check whether group G has other
receivers. If Device A does not receive IGMP Report messages of group G within the
period specified by the query message, Device A deletes the information about group G
and stops forwarding multicast traffic of group G.
IGMP Characteristic
Version
IGMP Characteristic
Version
segments. In IGMPv3, source information in multicast group records can be
filtered in either include mode or exclude mode:
− In include mode:
− If a source is included in a group record and the source is active, the
router forwards the multicast data of the source.
− If a source is included in a group record but the source is inactive, the
router deletes the source information and does not forward the multicast
data of the source.
− In exclude mode:
− If a source is active, the router forwards the multicast data of the source,
because there are hosts that require the multicast data of the source.
− If a source is inactive, the router does not forward the multicast data of
the source.
− If a source is excluded in a group record, the router forwards the
multicast data of the source.
IGMPv3 does not have the Report message suppression mechanism.
Therefore, all hosts joining a multicast group must reply with IGMP Report
messages when receiving IGMP Query messages.
In IGMPv3, multicast sources can be selected. Therefore, besides the
common query and multicast group query, an IGMPv3-enabled device adds
the designated multicast source and group query, enabling the router to find
whether receivers require data from a specified multicast source.
Advanta IGMPv2 provides IGMP Leave messages, and thus IGMPv2 can manage
ges of members of multicast groups effectively.
IGMPv2 The multicast group can be selected directly, and thus the selection is more
over precise.
IGMPv1
Advanta IGMPv3 allows hosts to select multicast sources, while IGMPv2 does not.
ges of An IGMPv3 message contains records of multiple multicast groups, and
IGMPv3 thus the number of IGMP messages is reduced on the network segment.
over
IGMPv2
Group-policy
Group-policy is configured on router interfaces to allow the router to set restrictions on
specific multicast groups, so that entries will not be created for the restricted multicast
groups. This improves IGMP security.
IGMP-Limit
When a large number of multicast users request multiple programs simultaneously, excessive
bandwidth resources will be exhausted, and the router's performance will be degraded,
deteriorating the multicast service quality.
To prevent this problem, configure IGMP-limit on a router interface to limit the maximum
number of IGMP entries on the interface. When receiving an IGMP Join message from a user,
the router interface first checks whether the configured maximum number of IGMP entries is
reached. If the maximum number is reached, the router interface discards the IGMP Join
message and rejects the user. If the maximum number is not reached, the router interface sets
up an IGMP membership and forwards data flows of the requested multicast group to the user.
This mechanism enables users who have successfully joined multicast groups to enjoy
smoother multicast services.
For example, on the network shown in Figure 1-917, if the maximum number of IGMP entries
is set to 1 on Interface 1 of router A, Interface 1 allows only one host to join a multicast group
and creates an IGMP entry only for the permitted host.
The working principles of IGMP-limit are as follows:
IGMP-limit allows you to configure a maximum number of IGMP entries on a router
interface. After receiving an IGMP Join message, a router interface determines whether
to create an entry by checking whether the number of IGMP entries has reached the
upper limit on the interface.
IGMP-limit allows you to configure an ACL on a router interface, so that the interface
permits IGMP Join messages containing a group address, including a source-group
address, in the range specified in the ACL, irrespective of whether the configured
maximum number of IGMP entries is reached. An IGMP entry that contains a group
address in the range specified in the ACL is not counted as one entry on an interface.
The principles of counting the number of IGMP entries are as follows:
Each (*, G) entry is counted as one entry on an interface, and each (S, G) is counted as
one entry on an interface.
Source-specific multicast (SSM) mapping (*, G) entries are not counted as entries on an
interface, and each (S, G) entry mapped using the SSM-mapping mechanism is counted
as one entry on an interface.
Figure 1-918 Source address-based filtering for IGMP Report or Leave messages
On the network shown in Figure 1-919, Device A is a querier that receives IGMP Report or
Leave messages from hosts. If Device B constructs bogus IGMP Query messages that contain
a source address lower than Device A's address, such as 10.0.0.1/24, Device A will become a
non-querier and fail to respond to IGMP Leave messages from hosts, so Device A continues to
forward multicast traffic to user hosts who have left, which wastes network resources. To
resolve this problem, you can configure an ACL rule on Device A to drop IGMP Query
messages with the source address 10.0.0.1/24.
Group-Policy
Group-policy is a filtering policy configured on router interfaces. For example, on the
network shown in Figure 1-920, Host A and Host C request to join the multicast group
225.1.1.1. Host B and Host D request to join the multicast group 226.1.1.1. Group-policy is
configured on router A to permit join requests only for the multicast group 225.1.1.1. Then,
router A creates entries for Host A and Host C, but not for Host B or Host D.
To improve network security and facilitate network management, you can use group-policy to
disable a router interface from receiving IGMP Report messages from or forwarding multicast
data to specific multicast groups.
Group-policy is implemented through access control list (ACL) configurations.
Background
IGMPv3 supports source-specific multicast (SSM) but IGMPv1 and IGMPv2 do not.
Although the majority of latest multicast devices support IGMPv3, most legacy multicast
terminals only support IGMPv1 or IGMPv2. SSM mapping is a transition solution that
provides SSM services for such legacy multicast terminals.
Using rules that specify the mapping from a particular multicast group G to a source-specific
group, SSM mapping can convert IGMPv1 or IGMPv2 packets whose group addresses are
within the SSM range to IGMPv3 packets. This mechanism allows hosts running IGMPv1 or
IGMPv2 to access SSM services. SSM mapping allows IGMPv1 or IGMPv2 terminals to
access only specific sources, thus minimizing the risks of attacks on multicast sources.
A multicast device does not process the (*, G) requirements, but only processes the (S, G) requirements
from the multicast group in the SSM address range. For details about SSM, see 1.11.4.2.2 PIM-SSM.
Implementation
As shown in Figure 1-922, on the user network segment of the SSM network, Host A runs
IGMPv3, Host B runs IGMPv2, and Host C runs IGMPv1. To enable the SSM network to
provide SSM services for all of the hosts without upgrading the IGMP versions to IGMPv3,
configure SSM mapping on the multicast device.
If Device A has SSM mapping enabled and is configured with mappings between group
addresses and source addresses, it will perform the following actions after receiving a (*, G)
message from Host B or Host C:
If the multicast group address contained in the message is within the any-source
multicast (ASM) range, Device A processes the request as described in 1.11.3.2.1
Principles of IGMP.
If the multicast group address contained in the message is within the SSM range, Device
A maps a (*, G) join message to multiple (S, G) join messages based on mapping rules.
With this processing, hosts running IGMPv1 or IGMPv2 can access multicast services
available only in the SSM range.
Background
After IGMP is configured on hosts and the hosts' directly connected multicast device, the
hosts can dynamically join multicast groups, and the multicast device can manage multicast
group members on the local network.
In some cases, the device directly connected to a multicast device, however, may not be a host
but an IGMP proxy-capable access device to which hosts are connected. If you configure only
IGMP on the multicast device, access device, and hosts, the multicast and access devices need
to exchange a large number of packets.
To resolve this problem, enable IGMP on-demand on the multicast device. The multicast
device sends only one general query message to the access device. After receiving the general
query message, the access device sends the collected Join and Leave status of multicast
groups to the multicast device. The multicast device uses the Join and Leave status of the
multicast groups to maintain multicast group memberships on the local network segment.
Benefits
IGMP on-demand reduces packet exchanges between a multicast device and its connected
access device and reduces the loads on these devices.
Related Concepts
IGMP on-demand enables a multicast device to send only one IGMP general query message
to its connected access device (IGMP proxy-capable) and to use Join/Leave status of multicast
groups reported by its connected access device to maintain IGMP group memberships.
Implementation
When a multicast device is directly connected to hosts, the multicast device sends IGMP
Query messages to and receives IGMP Report and Leave messages from the hosts to identify
the multicast groups that have receivers. The device directly connected to the multicast device,
however, may be not a host but an IGMP proxy-capable access device, as shown in Figure
1-923.
only when the Join or Leave status of the group changes. To be specific, the CE sends an
IGMP Report message for a multicast group to the PE only when the first user joins the
multicast group and sends a Leave message only when the last user leaves the multicast
group.
After you enable IGMP on-demand on a multicast device connected to an IGMP proxy-capable access
device, the multicast device implements IGMP in a different way as it implements standard IGMP in the
following aspects:
The multicast device interface connected to the access device sends only one IGMP general query
message to the access device.
The records about dynamically joined IGMP groups on the multicast device interface connected to
the access device do not time out.
The multicast device interface connected to the access device directly deletes the entry for a group
only after the multicast device interface receives an IGMP Leave message for the group.
IGMP IPsec This function is IGMP IPsec uses security IGMP IPsec
used to authenticate association (SA) to applies to
IGMP packets to authenticate sent and received multicast devices
prevent bogus IGMP packets. The IGMP connected to user
IGMP protocol IPsec implementation process hosts.
packet attacks, is as follows:
improving multicast Before an interface sends
service security. out an IGMP protocol
packet, IPsec adds an AH
header to the packet.
After an interface receives
an IGMP protocol packet,
IPsec uses an SA to
authenticate the AH header
in the packet. If the AH
header is authenticated, the
interface forwards the
packet. Otherwise, the
interface discards the
packet.
NOTE
For IPsec feature description, see
1.16.11 IPsec.
1.11.4 PIM
1.11.4.1 Introduction
Purpose
A multicast network requires multicast protocols to replicate and forward multicast data. PIM
is a widely used intra-domain multicast protocol that builds MDTs to transmit multicast data
between routers in the same domain.
PIM can create multicast routing entries on demand, forward packets based on multicast
routing entries, and dynamically respond to network topology changes.
Definition
If IPv4 PIM and IPv6 PIM implement a feature in the same way, details are not provided in this chapter.
For details about implementation differences, see 1.11.4.4 Appendix.
PIM is a multicast routing protocol that uses unicast routing protocols to forward data, but
PIM is independent of any specific unicast routing protocols.
PIM has three implementation modes: PIM-SM, and PIM-SSM. These modes apply to both
IPv4 and IPv6 networks.
Benefits
PIM works together with other multicast protocols to implement applications, such as:
Multimedia and media streaming applications
Training and tele-learning communication
Data storage and financial management applications
IP multicast is being widely used in Internet services, such as online broadcasts, network TV,
e-learning, telemedicine, network TV stations, and real-time video/voice conferencing
services.
1.11.4.2 Principles
1.11.4.2.1 PIM-SM
PIM-SM implements P2MP data transmission on large-scale networks on which multicast
data receivers are sparsely distributed. PIM-SM forwards multicast data only to network
segments with active receivers that have required the data.
PIM-SM assumes that no host wants to receive multicast data, so PIM-SM sets up an MDT
only after a host requests multicast data and then sends the data to the host along the MDT.
Concepts
This section provides basic PIM-SM concepts. Figure 1-925 shows a typical PIM-SM
network.
PIM device
A router that runs PIM is called a PIM device. A router interface on which PIM is
enabled is called a PIM interface.
PIM domain
A network constructed by PIM devices is called a PIM network.
A PIM-SM network can be divided into multiple PIM-SM domains by configuring BSR
boundaries on router interfaces to restrict BSR message transmission. PIM-SM domains
isolate multicast traffic between domains and facilitate network management.
Designated router
A designated router (DR) can be a multicast source's DR or a receiver's DR.
− A multicast source's DR is a PIM device directly connected to a multicast source
and is responsible for sending Register messages to an RP.
− A receiver's DR is a PIM device directly connected to receiver's hosts and is
responsible for sending Join messages to an RP and forwarding multicast data to
receiver's hosts.
RP
An RP is the forwarding core in a PIM-SM domain, used to process join requests of the
receiver's DR and registration requests of the multicast source's DR. An RP constructs an
MDT with the RP at the root and creates (S, G) entries to transmit multicast data to hosts.
All routers in the PIM-SM domain need to know the RP's location. The following table
lists the types of RPs.
BSR
A BSR on a PIM-SM network collects RP information, summarizes that information into
an RP-Set (group-RP mapping database), and advertises the RP-Set to the entire
PIM-SM network.
A network can have only one BSR but can have multiple C-BSRs. If a BSR fails, a new
BSR is elected from the C-BSRs.
RPT
An RPT is an MDT with an RP at the root and group members at the leaves.
SPT
An SPT is an MDT with the multicast source at the root and group members at the
leaves.
Implementation
The multicast data forwarding process in a PIM-SM domain is as follows:
1. Neighbor discovery
Each PIM device in a PIM-SM domain periodically sends Hello messages to all other
PIM devices in the domain to discover PIM neighbors and maintain PIM neighbor
relationships.
By default, a PIM device permits other PIM control messages or multicast messages from a neighbor,
irrespective of whether the PIM device has received Hello messages from the neighbor. However, if a
PIM device has the neighbor check function, the PIM device permits other PIM control messages or
multicast messages from a neighbor only after the PIM device has received Hello messages from the
neighbor.
2. DR election
PIM devices exchange Hello messages to elect a DR on a shared network segment. The
receiver's DR is the only multicast data forwarder on a shared network segment. The
source's DR is responsible for forwarding multicast data received from the multicast
source along an MDT.
3. RP discovery
An RP is the forwarding core in a PIM-SM domain. A dynamic or static RP forwards
multicast data over the entire network.
4. RPT setup
PIM-SM assumes that no hosts want to receive multicast data, so PIM-SM sets up an
RPT only after a host requests multicast data, and then sends the data from the RP to the
host along the RPT.
5. SPT switchover
A multicast group in a PIM-SM domain is associated with only one RP and one RPT. All
multicast data packets are forwarded by the RP. The path along which the RP forwards
multicast data may not be the shortest path from the multicast source to receivers. The
load of the RP increases when the multicast traffic volume increases. If the multicast data
forwarding rate exceeds a configured threshold, an RPT-to-SPT switchover can be
implemented to reduce the burden on the RP.
If a network problem occurs, the Assert mechanism or a DR switchover delay can be used to
guarantee that multicast data is transmitted properly.
Assert
If multiple multicast data forwarders exist on a network segment, each multicast packet
is repeatedly sent across the network segment, generating redundant multicast data. To
resolve this issue, the Assert mechanism can be used to select a unique multicast data
forwarder on a network segment.
DR switchover delay
If the role of an interface on a PIM device is changed from DR to non-DR, the PIM
device immediately stops using this interface to forward data. If multicast data sent from
a new DR does not arrive, multicast data traffic is temporarily interrupted. If a DR
switchover delay is configured, the interface continues to forward multicast data until the
delay expires. Setting a DR switchover delay prevents multicast data traffic from being
interrupted.
The detailed PIM-SM implementation process is as follows:
Neighbor Discovery
Each PIM-enabled interface on a PIM device sends Hello messages. A multicast packet that
carries a Hello message has the following features:
The destination address is 224.0.0.13, indicating that this packet is destined for all PIM
devices on the same network segment as the interface that sends this packet.
The source address is an interface address.
The TTL value is 1, indicating that the packet is sent only to neighbor interfaces.
Hello messages are used to discover neighbors, adjust protocol parameters, and maintain
neighbor relationships.
Discovering PIM neighbors
All PIM devices on the same network segment must receive multicast packets with the
destination address 224.0.0.13. Directly connected multicast routers can then learn
neighbor information from the received Hello messages.
A router can receive PIM control messages or multicast packets from a neighbor only
after the router receives a Hello message from the neighbor. PIM control messages and
multicast packets are used for creating multicast routing entries and maintaining MDTs.
Adjusting protocol parameters
A Hello message carries the following protocol parameters:
− DR_Priority: priority used by each router to elect a DR. The higher a router's
priority is, the higher the probability that the router will be elected as the DR.
− Holdtime: timeout period during which the neighbor remains in the reachable state.
− LAN_Delay: delay for transmitting a Prune message on the shared network
segment.
DR Election
The network segment on which a multicast source or group members reside is usually
connected to multiple PIM devices, as shown in Figure 1-926. The PIM devices exchange
Hello messages to set up PIM neighbor relationships. A Hello message carries the DR priority
and the address of the interface that connects the PIM device to this network segment. The
router compares the local information with the information carried in the Hello messages sent
by other PIM devices to elect a DR. This process is a DR election. The election rules are as
follows:
The PIM router with the highest DR priority wins.
If PIM devices have the same DR priority or PIM devices that do not support Hello
messages carrying DR priorities exist on the network segment, the PIM device with the
highest IP address wins.
RP Discovery
Static RP
c. The BSR collects the received information as an RP-Set, encapsulates the RP-Set
information in a Bootstrap message, and advertises the Bootstrap message to all
PIM-SM devices.
d. Each router uses the RP-Set information to perform the same calculations and
comparisons to elect an RP from multiple C-RPs. The election rules are as follows:
i. A C-RP wins if it serves the group address that users join has the longest mask.
ii. If group addresses that users join and are served by C-RPs have the same mask
length, the priorities of the C-RPs are compared. The C-RP with the highest
priority wins (the greater the priority value, the lower the priority).
iii. If the C-RPs have same priority, the hash function is started. The C-RP with
the greatest calculated value wins.
iv. If none of the above criteria can determine a winner, the C-RP with the highest
address wins.
e. Because all routers use the same RP-Set and the same election rules, the
relationship between the multicast group and the RP is the same for all routers. The
routers save this relationship to guide subsequent multicast operations.
If a router needs to interwork with an auto-RP-capable device, enable auto-RP listening.
After auto-RP listening is enabled, the router can receive auto-RP announcement and
discovery messages, parse the messages to obtain source addresses, and perform RPF
checks based on the source addresses.
− If an RPF check fails, the router discards the auto-RP message.
− If an RPF check succeeds, the router forwards the auto-RP message to PIM
neighbors. The auto-RP message carries the multicast group address range served
by the RP to guide subsequent multicast operations.
Auto-RP listening is supported only in IPv4 scenarios.
Embedded-RP
Embedded-RP is a mode used by the router in the ASM model to obtain an RP address.
This mode applies only within an IPv6 PIM-SM domain or between IPv6 PIM-SM
domains. To ensure consistent RP election results, an RP obtained in embedded-RP mode
takes precedence over RPs elected using other mechanisms. The address of an RP
obtained in embedded-RP mode must be embedded in an IPv6 multicast group address,
which must meet both of the following conditions:
− In the range of IPv6 multicast addresses.
− The IPv6 multicast group address must not be within the SSM group address range.
After a router calculates the RP address from the IPv6 multicast group address, the router
uses the RP address to discover a route for forwarding multicast packets. The process for
calculating the RP address is as follows:
a. The router copies the first N bits of the network prefix in the IPv6 multicast group
address. Here, N is specified by the plen field.
b. The router replaces the last four bits with the contents of the RIID field. An RP
address is then obtained.
Figure 1-928 shows the mapping between the IPv6 multicast group address and RP
address.
Figure 1-928 Mapping between the IPv6 multicast group address and RP address
Anycast-RP
In a traditional PIM-SM domain, each multicast group is mapped to only one RP. When
the network is overloaded or traffic is heavy, many network problems can occur. For
example, if the RP is overloaded, routes will converge slowly, or the multicast
forwarding path will not be optimal.
Anycast-RP can be used to address these problems. Currently, Anycast-RP can be
implemented through MSDP or PIM:
− Through MSDP: Multiple RPs with the same address are configured in a PIM-SM
domain and MSDP peer relationships are set up between the RPs to share multicast
data sources.
This mode is only for use on IPv4 networks. For details about the implementation
principles, see 1.11.5.2.3 Anycast-RP in MSDP.
− Multiple RPs with the same address are configured in a PIM-SM domain and the
device where an RP resides is configured with a unique local address to identify the
RP. These local addresses are used to set up connectionless peer relationships
between the devices. The peers share multicast source information by exchanging
Register messages.
This mode is for use on both IPv4 and IPv6 networks.
These two modes cannot be both configured on the same device in a PIM-SM domain. If Anycast-RP is
implemented through PIM, you can also configure the device to advertise the source information
obtained from MSDP peers in another domain to peers in the local domain.
Receivers and the multicast source each select the RPs closest to their own location to create
RPTs. After receiving multicast data, the receiver's DR determines whether to trigger an SPT
switchover. Using Anycast-RP is an implementation strategy that facilitates optimal RPTs and
load sharing. The following section covers the principles of Anycast-RP in PIM.
In the PIM-SM domain shown in Figure 1-929, multicast sources S1 and S2 send multicast
data to multicast group G, and U1 and U2 are members of group G. Perform the following
operations to apply the PIM protocol to implement Anycast-RP in the PIM-SM domain:
Configure RP1 and RP2 and assign both the same IP address (address of a loopback
interface). Assume that the IP address is 10.10.10.10.
Set up a connectionless peer relationship between RP1 and RP2 using unique IP
addresses. Assume that the IP address of RP1 is 1.1.1.1 and the IP address of RP2 is
2.2.2.2.
The implementation of Anycast-RP in PIM is as follows:
1. The receiver sends a Join message to the closest RP and builds an RPT.
− U1 joins the RPT with RP1 at the root, and RP1 creates a (*, G) entry.
− U2 joins the RPT with RP2 at the root, and RP2 creates a (*, G) entry.
2. The multicast source sends a Register message to the closest RP.
− DR1 sends a Register message to RP1 and RP1 creates an (S1, G) entry. Multicast
data from S1 reaches U1 along the RPT.
− DR2 sends a Register message to RP2 and RP2 creates an (S2, G) entry. Multicast
data from S2 reaches U2 along the RPT.
3. After receiving Register messages from the source's DRs, RPs re-encapsulate the
Register messages and forward them to peers to share multicast source information.
− After receiving the (S1, G) Register message from DR1, RP1 replaces the source
and destination addresses with 1.1.1.1 and 2.2.2.2, respectively, and re-encapsulates
the message and sends it to RP2. Upon receiving the specially encapsulated
Register message from peer 1.1.1.1, RP2 processes this Register message without
forwarding it to other peers.
− After receiving the (S2, G) Register message from DR2, RP2 replaces the source
and destination addresses with 2.2.2.2 and 1.1.1.1, respectively, and re-encapsulates
the message and sends it to RP1. Upon receiving the specially encapsulated
Register message from peer 2.2.2.2, RP1 processes this Register message without
forwarding it to other peers.
4. The RP joins an SPT with the source's DR as the root to obtain multicast data.
− RP1 sends a Join message to S2. Multicast data from S2 first reaches RP1 along the
SPT and then reaches U1 along the RPT.
− RP2 sends a Join message to S1. Multicast data from S1 reaches RP2 first through
the SPT and then reaches U2 through the RPT.
5. After receiving multicast data, the receiver's DR determines whether to trigger an SPT
switchover.
RPT Setup
Figure 1-930 shows the RPT setup and data forwarding processes.
To reduce the RPT forwarding loads and improve multicast data forwarding efficiency, PIM-SM
supports SPT switchovers, allowing a multicast network to set up an SPT with the multicast source at the
root. Then, the multicast source can send multicast data directly to receivers along the SPT.
SPT Switchover
In a PIM-SM domain, a multicast group interacts with only one RP, and only one RPT is set
up. If SPT switchover is not enabled, all multicast packets must be encapsulated in Register
messages and then sent to the RP. After receiving the packets, the RP de-encapsulates them
and forwards them along the RPT.
Since all multicast packets forwarded along the RPT are transferred by the RP, the RP may be
overloaded when multicast traffic is heavy. To resolve this problem, PIM-SM allows the RP or
the receiver's DR to trigger an SPT switchover.
Assert
Either of the following conditions indicates other multicast forwarders are present on the
network segment:
A multicast packet fails the RPF check.
The interface that receives the multicast packet is a downstream interface in the (S, G)
entry on the local router.
If other multicast forwarders are present on the network segment, the router starts the Assert
mechanism.
The router sends an Assert message through the downstream interface. The downstream
interface also receives an Assert message from a different multicast forwarder on the network
segment. The destination address of the multicast packet in which the Assert message is
encapsulated is 224.0.0.13. The source address of the packet is the downstream interface
address. The TTL value of the packet is 1. The Assert message carries the route cost from the
PIM device to the source or RP, priority of the used unicast routing protocol, and the group
address.
The router compares its information with the information contained in the message sent by its
neighbor. This is called Assert election. The election rules are as follows:
1. The router that runs a higher priority unicast routing protocol wins.
2. If the routers have the same unicast routing protocol priority, the router with the smaller
route cost to the source wins.
3. If the routers have the same priority and route cost, the router with the highest IP address
for the downstream interface wins.
The router performs the following operations based on the Assert election result:
If the router wins the election, the downstream interface of the router is responsible for
forwarding multicast packets on the network segment. The downstream interface is
called an Assert winner.
If the router does not win the election, the downstream interface is prohibited from
forwarding multicast packets and is deleted from the downstream interface list of the (S,
G) entry. The downstream interface is called an Assert loser.
After Assert election is complete, only one upstream router that has a downstream interface
exists on the network segment, and the downstream interface transmits only one copy of each
multicast packet. The Assert winner then periodically sends Assert messages to maintain its
status as the Assert winner. If the Assert loser does not receive any Assert messages from the
Assert winner after the timer of the Assert loser expires, the loser re-adds downstream
interfaces for multicast data forwarding.
DR Switchover Delay
If an existing DR fails, the PIM neighbor relationship times out, and a new DR election is
triggered.
By default, when an interface changes from a DR to a non-DR, the router immediately stops
using the interface to forward data. If multicast data sent from a new DR has not yet arrived at
the interface, multicast data streams are temporarily interrupted.
When a PIM-SM interface that has a PIM DR switchover delay configured receives Hello
messages from a new neighbor and changes from a DR to a non-DR, the interface continues
to function as a DR and to forward multicast packets until the delay times out.
If the router that has a DR switchover delay configured receives packets from a new DR
before the delay expires, the router immediately stops forwarding packets. When a new IGMP
Report message is received on the shared network segment, the new DR (instead of the old
DR configured with a DR switchover delay) sends a PIM Join message to the upstream
device.
If the new DR receives multicast data from the original DR before the DR switchover delay expires, an
Assert election is triggered.
Each BSR administrative domain provides services to the multicast group within a
specific address range. The multicast groups that different BSR administrative domains
serve can overlap. However, a multicast group address that a BSR administrative domain
serves is valid only in its BSR administrative domain because a multicast address is a
private group address. As shown in Figure 1-933, the group address range of BSR1
overlaps with that of BSR3.
The multicast group that does not belong to any BSR administrative domain belongs to
the global domain. That is, the group address range of the global domain is G-G1-G2.
Multicast function
As shown in Figure 1-932, the global domain and each BSR administrative domain have
their respective C-RP and BSR devices. Devices only function in the domain to which
they are assigned. Each BSR administrative domain has a BSR mechanism and RP
elections that are independent of other domains.
Each BSR administrative domain has a border. Multicast information for this domain,
such as the C-RP Advertisement messages and BSR Bootstrap message, can be
transmitted only within the domain. Multicast information for the global domain can be
transmitted throughout the entire global domain and can traverse any BSR administrative
domain.
1.11.4.2.2 PIM-SSM
Protocol Independent Multicast-Source-Specific Multicast (PIM-SSM) enables a user host to
rapidly join a multicast group if the user knows a multicast source address. PIM-SSM sets up
a shortest path tree (SPT) from a multicast source to a multicast group, while PIM-SM uses
rendezvous points (RPs) to set rendezvous point trees (RPTs). Therefore, PIM-SSM
implements a more rapid join function than PIM-SM.
Different from the any-source multicast (ASM) model, the SSM model does not need to
maintain an RP, construct an RPT, or register a multicast source.
The SSM model is based on PIM-SM and IGMPv3/Multicast Listener Discovery version 2
(MLDv2). The procedure for setting up a multicast forwarding tree on a PIM-SSM network is
similar to the procedure for setting up an SPT on a PIM-SM network. The receiver's DR,
which knows the multicast source address, sends Join messages directly to the source so that
multicast data streams can be sent to the receiver's designated router (DR).
In SSM mode, multicast traffic forwarding is based on (S, G) channels. To receive the multicast traffic
of a channel, a multicast user must join the channel. A multicast user can join or leave a multicast
channel by subscribing to or unsubscribing from the channel. Currently, only IGMPv3 can be used for
channel subscription or unsubscription.
Related Concepts
PIM-SSM implementation is based on PIM-SM. For details about PIM-SSM, see Related
Concepts.
Implementation
The process for forwarding multicast data in a PIM-SSM domain is as follows:
1. Neighbor Discovery
Each PIM device in a PIM-SSM domain periodically sends Hello messages to all other
PIM devices in the domain to discover PIM neighbors and maintain PIM neighbor
relationships.
By default, a PIM device permits other PIM control messages or multicast messages from a neighbor,
irrespective of whether the PIM device has received Hello messages from the neighbor. However, if a
PIM device has the neighbor check function, the PIM device permits other PIM control messages or
multicast messages from a neighbor only after the PIM device has received Hello messages from the
neighbor.
2. DR Election
PIM devices exchange Hello messages to elect a DR on a shared network segment. The
receiver's DR is the only multicast data forwarder on the segment.
3. SPT setup
Users on a PIM-SSM network can know the multicast source address and can, therefore,
specify the source when joining a multicast group. After receiving a Report message
from a user, the receiver's DR sends a Join message towards the multicast source to
establish an SPT between the source and the user. Multicast data is then sent by the
multicast source to the user along the SPT.
SPT establishment can be triggered by user join requests (both dynamic and static) and
SSM-mapping.
The DR in an SSM scenario is valid only in the shared network segment connected to group
members. The DR on the group member side sends Join messages to the multicast source, creates
the (S, G) entry hop by hop, and then sets up an SPT.
PIM-SSM supports PIM silent, BFD for PIM, and a PIM DR switchover delay.
Currently, BFD for PIM can be used on both IPv4 PIM-SM/Source-Specific Multicast (SSM) and IPv6
PIM-SM/SSM networks.
As shown in Figure 1-934, on the shared network segment where user hosts reside, a PIM
BFD session is set up between the downstream interface Port 2 of Device B and the
downstream interface Port 1 of Device C. Both ports send BFD packets to detect the status of
the link between them.
Port 2 of Device B is elected as a DR for forwarding multicast data to the receiver. If Port 2
fails, BFD immediately notifies the RM module of the session status and the RM module then
notifies the PIM module. The PIM module triggers a new DR election. Port 1 of Device C is
then elected as a new DR to forward multicast data to the receiver.
PIM IPsec
can
authenticate
the
following
types of
PIM
packets:
PIM
multicas
t
protocol
packets,
such as
Hello
and
Join/Pru
ne
packets.
PIM
unicast
protocol
packets,
such as
Register
and
Register
-Stop
packets.
NOTE
For IPsec
feature
description,
see 1.16.11
Background
SPT setup relies on unicast routes. If a link or node failure occurs, a new SPT can be set up
only after unicast routes are converged. This process is time-consuming and may cause severe
multicast traffic loss.
PIM FRR resolves these issues. It allows a device to search for a backup FRR route based on
unicast routing information and send the PIM Join message of a multicast receiver along both
the primary and backup routes, setting both primary and backup SPTs. The cross node of the
primary and backup links can receive one copy of a multicast flow from each of the links.
Each device's forwarding plane permits the multicast traffic on the primary link and discards
that on the backup link. However, the forwarding plane starts permitting multicast traffic on
the backup link as soon as the primary link fails, thus minimizing traffic loss.
PIM FRR supports fast SPT switchovers only in IPv4 PIM-SSM or PIM-SM. In extranet scenarios, PIM
FRR supports only source VPN, not receiver VPN entries.
Implementation
PIM FRR implementation involves three steps:
1. Setup of primary and backup SPTs for a multicast receiver
Each PIM-SM/PIM-SSM device adds the inbound interface information to the (S, G)
entry of the receiver, and then searches for a backup FRR route based on unicast routing
information. After a backup FRR route is discovered, each device adds the backup
route's inbound interface information to the (S, G) entry so that two routes become
available from the source to the multicast group requested by the receiver. Each device
then sends a PIM Join message along both the primary and backup routes to set up two
SPTs. Figure 1-935 shows the process of setting up two SPTs for a multicast receiver.
Figure 1-935 Setup of primary and backup SPTs for a multicast receiver
Table 1-271 PIM FRR implementation before and after a link or node failure occurs
Remote primary In Figure 1-940, Device A permits the In Figure 1-941, Device
link multicast traffic on the primary link and A starts permitting
discards that on the backup link. multicast traffic on the
backup link (Device C ->
Figure 1-940 PIM FRR implementation Device D -> Device A)
before a remote primary link failure as soon as Device A
occurs detects the remote
primary link failure.
3. Traffic switchback
After the link or node failure is resolved, PIM detects a route change at the protocol layer,
starts route switchback, and then smoothly switches traffic back to the primary link.
PIM FRR in Scenarios Where IGP FRR Cannot Fulfill Backup Root Computation
Independently
PIM FRR relies on IGP FRR to compute both primary and backup routes. However, a live
network may encounter backup route computation failures on some nodes due to the increase
of network nodes. Therefore, if IGP FRR cannot fulfill route computation independently on a
network, deploy IP FRR to work jointly with IGP FRR. The following example uses a ring
network.
On the ring network shown in Figure 1-942, Device C connects to a multicast receiver. The
primarily multicast traffic link for this receiver is Device C -> Device B -> Device A. To
compute a backup route for the link Device D -> Device C, IGP FRR requires that the cost of
link Device D -> Device A be less than the cost of link Device C -> Device A plus the cost of
link Device D -> Device C. That is, the cost of link Device D -> Device E -> Device F ->
Device A must be less than the cost of link Device C -> Device A plus the cost of link Device
D -> Device C. This ring network does not meet this requirement; therefore, IGP FRR cannot
compute a backup route for link Device D -> Device C.
To resolve this issue, manually specify a backup route to the multicast source. Configure a
static route whose destination is the multicast source, next hop is Device D, and preference is
lower than that of the IGP route, as follows.
The IGP route is Device C -> Device B -> Device A, which has a higher preference and
functions as the primary link.
The static route is Device C -> Device D -> Device E -> Device F -> Device A, which
has a lower preference and functions as the backup link.
Before a link or node failure occurs, Device C permits the multicast traffic on the primary link
and discards that on the backup link. After a link or node failure occurs, Device C starts
permitting the multicast traffic on the backup link as soon as it detects the failure.
Benefits
PIM FRR helps improve the reliability of multicast services and minimize service loss for
users.
Field Description
Reserved Reserved
Checksum Checksum
Hello Messages
PIM devices periodically send Hello messages through all PIM interfaces to discover
neighbors and maintain neighbor relationships.
In an IP packet that carries a Hello message, the source address is a local interface's address,
the destination address is 224.0.0.13, and the TTL value is 1. The IP packet is transmitted in
multicast mode.
Field Description
Register Messages
When a multicast source becomes active on a PIM-SM network, the source's DR sends a
Register message to register with the rendezvous point (RP).
In an IP packet that carries a Register message, the source address is the address of the
source's DR, and the destination address is the RP's address. The message is transmitted in
unicast mode.
Field Description
Type Message type
The value is 0.
Reserved The field is set to 0 when the message is sent and is ignored
when the message is received.
Checksum Checksum
B Border bit
N Null-Register bit
Reserved2 Reserved
The field is set to 0 when the message is sent and this field is
ignored when the message is received.
Multicast data packet The source's DR encapsulates the received multicast data in a
Register message and sends the message to the RP. After
decapsulating the message, the RP learns the (S, G)
information of the multicast data packet.
A multicast source can send data to multiple groups, and therefore a source's DR must send
Register messages to the RP of each target multicast group. A Register message is
encapsulated only in one multicast data packet, so the packet carries only one copy of (S, G)
information.
In the register suppression period, a source's DR sends Null-Register messages to notify the
RP of the source's active state. A Null-Register message contains only an IP header, including
the source address and group address. After the register suppression times out, the source's
DR encapsulates a Register message into a multicast data packet again.
Register-Stop Messages
Field Description
Type Message type
The value is 2.
Group Address Multicast group address
Source Address Multicast source address
An RP can serve multiple groups, and a group can receive data from multiple sources.
Therefore, an RP may simultaneously perform multiple (S, G) registrations.
A Register-Stop message carries only one copy of the (S, G) information. When an RP sends a
Register-Stop message to a source's DR, the RP can terminate only one (S, G) registration.
After receiving the Register-Stop message carrying the (S, G) information, the source's DR
stops encapsulating (S, G) packets. The source still uses Register messages to encapsulate
packets and send the packets to other groups.
Join/Prune Messages
A Join/Prune message can contain both Join messages and Prune messages. A Join/Prune
message that contains only a Join message is called a Join message. A Join/Prune message
that contains only a Prune message is called a Prune message.
When a PIM device is not required to send data to its downstream interfaces, the PIM
device sends Prune messages through its upstream interfaces to instruct upstream devices
to stop forwarding packets to the network segment on which the PIM device resides.
When a receiver starts to require data from a PIM-SM network, the receiver's DR sends a
Join message through the reverse path forwarding (RPF) interface towards the RP to
instruct the upstream neighbor to forward packets to the receiver. The Join message is
sent in the upstream direction hop by hop to set up an RPT.
When an RP triggers an SPT switchover, the RP sends a Join message through the RPF
interface connected to the source to instruct the upstream neighbor to forward packets to
the network segment. The Join message is sent in the upstream direction hop by hop to
set up an SPT.
When a receiver's DR triggers an SPT switchover, the DR sends a Join message through
the RPF interface connected to the source to instruct the upstream neighbor to forward
packets to the network segment. The Join message is sent in the upstream direction hop
by hop to set up an SPT.
A PIM network segment may be connected to a downstream interface and multiple
upstream interfaces. After an upstream interface sends a Prune message, if other
upstream interfaces still require multicast packets, these interfaces must send Join
messages within the override-interval. Otherwise, the downstream interfaces responsible
for forwarding packets on the network segment do not perform the prune action.
If PIM is enabled on the interfaces of user-side routers, a receiver' DR is elected, and outbound
interfaces are added to the PIM DR's outbound interface list. The PIM DR then sends Join messages
to the RP.
In an IP packet that carries a Join/Prune message, the source address is a local interface's
address, the destination address is 224.0.0.13, and the TTL value is 1. The message is
transmitted in multicast mode.
Field Description
Type Message type
The value is 3.
Upstream Neighbor Upstream neighbor's address, that is, the address of the
Address downstream interface that receives the Join/Prune message and
performs the Join or Prune action
Number of Groups Number of groups contained in the message
Holdtime Duration (in seconds) that an interface remains in the Join or
Prune state
Group Address Group address
Number of Joined Number of sources that the router joins
Sources
Number of Pruned Number of sources that the router prunes
Sources
Field Description
Joined Source Address Address of the source that the router joins
Pruned Source Address Address of the source that the router prunes
Bootstrap Messages
Field Description
Type Message type
The value is 4.
Fragment Tag Random number used to distinguish the Bootstrap message
Hash Mask length Length of the hash mask of the C-BSR
BSR-priority C-BSR priority
BSR-Address C-BSR address
Group Address Group address
RP-Count Total number of candidate-rendezvous points (C-RPs) that
serve the group
Frag RP-Cnt Number of C-RP addresses included in this fragment of the
Bootstrap message for the corresponding group range.
This field facilitates parsing of the RP-Set for a given group
range, when carried over more than one fragment.
RP-address C-RP address
RP-holdtime Aging time of the advertisement message sent by the C-RP
RP-Priority C-RP priority
The BSR boundary of a PIM interface can be set by using the pim bsr-boundary command
on the interface. Multiple BSR boundary interfaces divide the network into different PIM-SM
domains. Bootstrap messages cannot pass through the BSR boundary.
Assert Messages
On a shared network segment, if a PIM router receives an (S, G) packet from the downstream
interface of the (S, G) or (*, G) entry, it indicates that other forwarders exist on the network
segment. The PIM router then sends an Assert message through the downstream interface to
participate in the forwarder election. The router that fails in the forwarder election stops
forwarding multicast packets through the downstream interface.
In an IP packet that carries an Assert message, the source address is a local interface's address,
the destination address is 224.0.0.13, and the TTL value is 1. The packet is transmitted in
multicast mode.
Field Description
Type Message type
The value is 5.
Group Address Group address
Source address This field is a multicast source address if a unique forwarder is
elected for (S, G) entries, and this field is 0 if a unique forwarder
is elected for (*, G) entries.
R RPT bit
This field is 0 if a unique forwarder is elected for (S, G) entries,
and this field is 1 if a unique forwarder is elected for (*, G)
entries.
Metric Preference Priority of the unicast path to the source address
If the R field is 1, this field indicates the priority of the unicast
path to the RP.
Metric Cost of the unicast route to the source address
If the R field is 1, this field indicates the cost of the unicast path
to the RP.
When a dynamic RP is used, C-RPs periodically send Advertisement messages to notify the
BSR of the range of groups they want to serve.
In an IP packet that carries an Advertisement message, the source address is the source's C-RP
address, and the destination address is the BSR's address. The packet is transmitted in unicast
mode.
Field Description
Background
IP and MPLS are generally used to forward packets on traditional core and backbone
networks. Deployment of multicast services, such as IPTV, multimedia conferences, and
real-time online games continues to increase on IP/MPLS networks. These services require
sufficient bandwidth, assured QoS, and high reliability on the bearer network. Currently, the
following multicast solutions are used to run multicast services, but these solutions cannot
meet the requirements of multicast services and network carriers:
IP multicast technology: It can be deployed on point-to-point (P2P) networks to run
multicast services, reducing network upgrade and maintenance costs. Similar to IP
unicast, IP multicast does not support QoS or traffic planning and has low reliability.
Multicast applications place high demands on real-time transmission and reliability, and
IP multicast technology cannot meet these requirements.
Establishing a dedicated multicast network: A dedicated multicast network is usually
constructed over Synchronous Optical Network (SONET)/Synchronous Digital
Hierarchy (SDH). SONET/SDH has high reliability and provides a high transmission
rate. However, such a network is expensive to construct, incurs significant OPEX, and
must be maintained separately.
IP/MPLS backbone network carriers require a multicast solution with high TE capabilities to
run multicast services on existing IP/MPLS backbone network devices.
Multicast over P2MP TE tunnels can meet the carriers' requirements by establishing tree
tunnels to transmit multicast data. It has the advantages of high IP multicast packet
transmission efficiency and assured MPLS TE end-to-end (E2E) QoS.
Benefits
Deploying P2MP TE on an IP/MPLS backbone network brings the following benefits:
Improves network bandwidth utilization.
Provides sufficient bandwidth for multicast services.
Simplifies network deployment using multicast protocols by not requiring PIM and
IGMP to be deployed on core devices on the network.
Related Concepts
P2MP TE data forwarding is similar to IP multicast data forwarding. A branch node copies
MPLS packets, swaps existing labels with outgoing labels in the MPLS packets, and sends
each separate copy of the MPLS packets over every sub-LSP. This process increases network
bandwidth resource usage.
For details on P2MP TE concepts, see Related Concepts in the HUAWEI NE20E-S2 Feature
Description - MPLS.
Ingresses
The P2MP tunnel interfaces of the ingresses (PE1 and PE2) direct multicast data to the
P2MP TE tunnel.
Egresses
The egresses (PE3, PE4, PE5, and PE6) must be configured to ignore the Unicast
Reverse Path Forwarding (URPF) check. Whether to configure multicast source proxy
on the egresses is based on the location of the rendezvous point (RP).
1.11.4.3 Applications
1.11.4.3.1 PIM Intra-domain
Continuing development of the Internet has led to considerable growth in the types of data,
voice, and video information exchanged online. New services, such as VoD and BTV, have
emerged and continue to develop. Multicast plays an increasingly important role in the
transmission of these services. This section describes Protocol Independent Multicast-Sparse
Mode (PIM-SM) intra-domain networking.
Figure 1-957 shows a large-scale network with multicast services deployed. An IGP has been
deployed, and each network segment route is reachable. Group members are distributed
sparsely. Users on the network require VoD services, but network bandwidth resources are
limited.
Implementation Solution
As shown in Figure 1-957, Host A and Host B are multicast information receivers, each
located on a different leaf network. The hosts receive VoD information in multicast mode.
PIM-SM is used in the entire PIM domain. Device B is connected to multicast source S1.
Device A is connected to multicast source S2. Device C is connected to Host A. Devices E
and F are connected to Host B.
Network configuration details are as follows:
PIM-SM is enabled on all router interfaces.
As shown in Figure 1-957, multicast sources are densely distributed.
Candidate-Rendezvous Points (C-RPs) can be deployed on devices close to the multicast
sources. Loopback 0 interfaces on Devices A and D are configured as
candidate-bootstrap routers (C-BSRs) and C-RPs. A BSR is elected among the C-BSRs.
An RP is elected among the C-RPs.
The RP deployment guidelines are as follows:
− Static RPs are recommended on small-/medium-sized networks because a
small-/medium-sized network is stable and has low forwarding requirements for an
RP.
If there is only one multicast source on the network, setting the device directly
connected to the multicast source as a static RP is recommended. The source's
designated router (DR) also functions as the RP and does not need to register with
the RP.
When a static RP is used, all routers, including the RP, must have the same
information about the RP and the multicast groups that the RP serves.
− Dynamic RPs are recommended on large-scale networks because dynamic RPs are
easy to maintain and provide high reliability.
Dynamic RP
To ensure RP information consistency, do not configure static RPs on some routers but dynamic RPs on
other routers in the same PIM domain.
IGMP is run between Device C and Host A and between Device E, Device F, and Host B.
When configuring IGMP on router interfaces, ensure that interface parameters are
consistent. All routers connected to the same network must run the same IGMP version
(IGMPv2 is recommended) and be configured with the same parameter values, such as
the interval at which IGMP Query messages are sent and holdtime of memberships.
Otherwise, IGMP group memberships on different routers will be inconsistent.
Hosts A and B send Join messages to the RP to require information from the multicast
source.
Configuring interfaces on network edge devices to statically join all multicast groups is recommended to
increase the speed for changing channels and to provide a stable viewing environment for users.
Implementation Solution
On the network shown in Figure 1-958, Hosts A and B are multicast information receivers,
each located on a different leaf network. The hosts receive VoD information in multicast mode.
PIM-SSM is used throughout the PIM domain. Device B is connected to multicast source S1.
Device A is connected to multicast source S2. Device C is connected to Host A. Devices E
and F are connected to Host B.
Network configuration details are as follows:
PIM-SSM is enabled on all router interfaces.
A receiver in a PIM-SSM scenario can send a Join message directly to a specific multicast source. A
shortest path tree (SPT) is established between the multicast source and receiver, not requiring the
network to maintain Rendezvous Points (RPs).
IGMP runs between Device C and Host A, between Device E and Host B, and between
Device F and Host B.
When configuring IGMP on router interfaces, ensure that interface parameters are
consistent. All routers connected to the same network must run the same IGMP version
(IGMPv2 is recommended) and be configured with the same interface parameter values,
such as the Query timer value and hold time of memberships. If the IGMP versions or
interface parameters are different, IGMP group memberships are inconsistent on
different routers.
Host A can send Join messages to S1. Host B can send Join messages to S2. Information
sent by these multicast sources can reach user hosts.
Configuring interfaces on network edge devices to statically join all multicast groups is recommended to
increase the speed for changing channels and to provide a stable viewing environment for users.
Service Overview
There is an increasing diversity of multicast services, such as IPTV, multimedia conference,
and massively multiplayer online role-playing games (MMORPGs), and multimedia
conferences. These services are transmitted over a service bearer network with the following
functions:
Forwards multicast traffic even during traffic congestion.
Rapidly detects network faults and switches traffic to a standby link.
Networking Description
point-to-multipoint (P2MP) Traffic Engineering (TE) supported on NE20Es is used on the
IP/MPLS backbone network shown in Figure 1-959. P2MP TE helps the network prevent
multicast traffic congestion and maintain reliability.
Feature Deployment
Figure 1-959 illustrates how P2MP TE tunnels are used to transmit IP multicast services. The
process consists of the following stages:
Import multicast services.
− An Internet Group Management Protocol (IGMP) static group is configured on a
network-side interface of each service router (SR). SR1 run the Protocol
Independent Multicast (PIM). Ingress PE1 functioning as a host sends an IGMP
Join message to SR1. After receiving the message, SR generates a multicast
forwarding entry and forwards multicast traffic to a PE. A traffic policy is
Service Overview
There is an increasing diversity of multicast services, such as IPTV, multimedia conference,
and massively multiplayer online role-playing games (MMORPGs), and multimedia
conferences. To bear these services, the service providers' networks have to meet the
following requirements:
Forwards multicast traffic even during traffic congestion.
Rapidly detects network faults and switches traffic to a standby link.
Networking Description
PIM FRR function deployed on the user-access devices helps the network prevent multicast
traffic congestion and maintain reliability. PIM FRR is used on the IPTV service network
shown inFigure 1-960.
Feature Deployment
PIM FRR is used to transmit and protect IP multicast services. The process consists of the
following stages:
Deploy IGP LFA FRR.
Deploy ISIS LFA FRR or OSPF LFA FRR to the protection nodes, such as DeviceA, so
that the nodes can generate main and backup unicast routes.
Configure PIM FRR.
PIM FRR is configured on protection nodes, such as DeviceA. When a user joins in, a
main multicast forwarding entry and a backup multicast forwarding entry are generated.
If the network operates normally, the protection nodes only receive the multicast traffic
from main link and drop the traffic from backup link. If the main link fails, the protection
node rapidly switch to the backup link to protect the multicast traffic.
Service Overview
In a NON-ECMP network, the IGP LFA FRR function may fail to calculate unicast routes. To
avoid multicast service failure, configure static main and backup routes to establish main and
backup links.
Networking Description
PIM FRR function deployed on the user-access devices helps the network prevent multicast
traffic congestion and maintain reliability. PIM FRR is used on the IPTV service network
shown inFigure 1-961.
Feature Deployment
PIM FRR is used to transmit and protect IP multicast services. The process consists of the
following stages:
Configure FRR based on multicast static routes.
Configure FRR based on multicast static routes on each node of the circle, so that each
node can generates main and backup unicast routes.
Configure PIM FRR.
PIM FRR is configured oneach node of the circle. When a user joins in, a main multicast
forwarding entry and a backup multicast forwarding entry are generated. If the network
operates normally, the protection nodes only receive the multicast traffic from main link
and drop the traffic from backup link. If the main link fails, the protection node rapidly
switch to the backup link to protect the multicast traffic.
1.11.4.4 Appendix
Feature IPv4 PIM IPv6 PIM Implementation Difference
1.11.5 MSDP
1.11.5.1 Introduction
Definition
Multicast Source Discovery Protocol (MSDP) is an inter-domain multicast solution that
applies to interconnected multiple Protocol Independent Multicast-Sparse Mode (PIM-SM)
domains. Currently, MSDP applies only to IPv4.
Purpose
A network composed of PIM-SM devices is called a PIM-SM network. In real-world
situations, a large PIM-SM network may be maintained by multiple Internet service providers
(ISPs).
A PIM-SM network uses Rendezvous Points (RPs) to forward multicast data. A large
PIM-SM network can be divided into multiple PIM-SM domains. On a PIM-SM network, an
RP does not communicate with RPs in other domains. An RP knows only the local multicast
source's location and distributes data only to local domain users. A multicast source registers
only with the local domain RP, and hosts send Join messages only to the local domain RP.
Using this approach, PIM-SM domains implement load splitting among RPs, enhance
network stability, and facilitate network management.
After a large PIM-SM network is divided into multiple PIM-SM domains, a mechanism is
required to implement inter-domain multicast. MSDP provides this mechanism, enabling
hosts in the local PIM-SM domain to receive multicast data from sources in other PIM-SM
domains.
In this section, a PIM-SM domain refers to the service range of an RP. A PIM-SM domain can be a
domain defined by bootstrap router (BSR) boundaries or a domain formed after you configure static RPs
on the router.
1.11.5.2 Principles
1.11.5.2.1 Inter-Domain Multicast in MSDP
MSDP Peer
On a PIM-SM network, MSDP enables Rendezvous Points (RPs) in different domains to
interwork. MSDP also enables different PIM-SM domains to share multicast source
information by establishing MSDP peer relationships between RPs.
An MSDP peer relationship can be set up between two RPs in the following scenarios:
Two RPs belong to the same AS but different PIM-SM domains.
Two RPs belong to different autonomous systems (ASs).
To ensure successful reverse path forwarding (RPF) checks in an inter-AS scenario, a BGP or
a Multicast Border Gateway Protocol (MBGP) peer relationship must be established on the
same interfaces as the MSDP peer relationship.
Basic Principles
Setting up MSDP peer relationships between RPs in different PIM-SM domains ensures the
communication between PIM-SM domains, and thereby forming an MSDP-connected graph.
MSDP peers exchange Source-Active (SA) messages. An SA message carries (S, G)
information registered by the source's DR with the RP. Message exchange between MSDP
peers ensures that SA messages sent by any RP can be received by all the other RPs.
Figure 1-962 shows a PIM-SM network divided into four PIM-SM domains. The source in the
PIM-SM 1 domain sends data to multicast group G. The receiver in the PIM-SM 3 domain is a
member of group G. RP 3 and the receiver's PIM-SM 3 domain maintain an RPT for group G.
As shown in Figure 1-962, the receiver in the PIM-SM 3 domain can receive data sent by the
source PIM-SM 1 domain after MSDP peer relationships are set up between RP 1, RP 2, and
RP 3. The data processing flow is as follows:
1. The source sends multicast data to group G. DR 1 encapsulates the data into a Register
message and sends the message to RP 1.
2. As the source's RP, RP 1 creates an SA message containing the IP addresses of the
source, group G, and RP 1. RP 1 sends the SA message to RP 2.
3. Upon receiving the SA message, RP 2 performs an RPF check on the message. If the
check succeeds, RP 2 forwards the message to RP3.
4. Upon receiving the SA message, RP 3 performs an RPF check on the message. If the
check succeeds, it means that (*, G) entries exist on RP 3, indicating that the local
domain contains members of group G. RP 3 then creates an (S, G) entry and sends a Join
message with the (S, G) information to the source hop by hop. A multicast path (routing
tree) from the source to RP 3 is then set up.
5. After the multicast data reaches RP 3 along the routing tree, RP 3 forwards the data to
the receiver along the rendezvous point tree (RPT).
6. After receiving the multicast data, the receiver determines whether to initiate shortest
path tree (SPT) switchover.
Background
If multiple Multicast Source Discovery Protocol (MSDP) peers exist in the same or different
ASs, the following problems may easily occur:
Source active (SA) messages are flooded between peers. Especially when many MSDP
peers are configured in the same PIM-SM domain, reverse path forwarding (RPF) rules
cannot filter out useless SA messages effectively. The MSDP peer needs to perform the
RPF check on each received SA messages, which brings heavy workload to the system.
SA messages are discarded due to RPF check failures.
To resolve these problems, configure a mesh group.
Implementation Principle
A mesh group requires each two MSDP peers in the group to set up a peer relationship,
implementing full-mesh connections in the group. To implement the mesh group function, add
all MSDP peers in the same and different ASs to the same mesh group on a multicast device.
When a member of the mesh group receives an SA message, it checks the source of the SA
message:
If the SA message is sent by a member of the mesh group, the member directly accepts
the message without performing the RPF check. In addition, it does not forward the
message to other members in the mesh group.
In real-world situations, adding all MSDP peers in the same and different ASs to the same mesh group is
recommended to prevent SA messages from being discarded due to RPF check failures.
If the SA message is sent by an MSDP peer outside the mesh group, the member
performs the RPF check on the SA message. If the SA message passes the check, the
member forwards it to other members of the mesh group.
The mesh group mechanism greatly reduces SA messages to be exchanged among MSDP
peers, relieving the workload of the multicast device.
Usage Scenario
In a traditional PIM-SM domain, each multicast group is mapped to only one rendezvous
point (RP). When the network is overloaded or traffic is heavy, many network problems occur.
For example, the RP may be overloaded, routes may converge slowly if the RP fails, or the
multicast forwarding path may not be optimal.
To resolve those problems, Anycast-RP is used in MSDP. Anycast-RP allows you to configure
multiple loopback interfaces as RPs in a PIM-SM domain, assign the same IP address to each
of these loopback interfaces, and set up MSDP peer relationships between these RPs. These
configurations help select the optimal paths and RPs and implement load splitting among the
RPs.
Implementation Principle
As shown in Figure 1-963, in a PIM-SM domain, the multicast sources, S1 and S2, send
multicast data to the multicast group G. U1 and U2 are members of group G.
1.11.5.3 Applications
Inter-Domain Multicast
Figure 1-964 shows an inter-domain multicast application.
An MSDP peer relationship is set up between rendezvous points (RPs) in two different
PIM-SM domains. Multicast source information can then be shared between the two
domains.
After multicast data reaches RP 1 (the source's RP), RP 1 sends a source active (SA)
message that carries the multicast source information to RP 2.
RP 2 initiates a shortest path tree (SPT) setup request to the source.
RP 2 forwards the multicast data to the receiver in the local domain.
After Receiver receives the multicast data, it independently determines whether to
initiate an SPT switchover.
Anycast-RP
Figure 1-965 shows an Anycast-RP application.
Device 1 and Device 2 function as RPs and establish an MSDP peer relationship between
each other.
Intra-domain multicast is performed using this MSDP peer relationship. A receiver sends
a Join message to the nearest RP to set up a rendezvous point tree (RPT).
The multicast source registers with the nearest RP. RPs exchange SA messages to share
the multicast source information.
Each RP joins an SPT with the source's DR at the root.
After receiving the multicast data, the receiver decides whether to initiate an SPT
switchover.
Definition
A multicast forwarding table consists of groups of (S, G) entries. In an (S, G) entry, S
indicates the source information, and G indicates the group information. The multicast route
management module supports multiple multicast routing protocols. The multicast forwarding
table therefore collects multicast routing entries generated by various types of protocols.
Multicast route management includes the following functions:
Reverse path forwarding (RPF) check
Multicast load splitting
Longest-match multicast routing
Multicast multi-topology
Multicast Boundary
Purpose
RPF check
This function is used to find an optimal unicast route to the multicast source and build a
multicast forwarding tree. The outbound interface of the unicast route functions as the
inbound interface of the forwarding entry. Then, when the forwarding module receives a
multicast data packet, the module matches the packet with the forwarding entry and
checks whether the inbound interface of the packet is correct. If the inbound interface of
the packet is identical with the outbound interface of the unicast routing entry, the packet
passes the RPF check; otherwise, the packet fails the RPF check and is discarded. The
RPF check prevents traffic loops in multicast data forwarding.
Multicast load splitting
If a multicast load splitting policy is configured, different forwarding entries that specify
the same multicast source can select different equal-cost routes as RPF routes to guide
multicast data forwarding. The RPF routes of forwarding entries can be hashed to
different equal-cost routes, and multicast data distribution is then implemented.
Longest-match multicast routing
During multicast routing, the router preferentially selects the route with the longest
matched mask length to implement accurate route matching.
Multicast multi-topology
The multicast multi-topology function helps you plan a multicast topology for multicast
services on a physical network. Then, when a multicast device performs the RPF check,
the device searches for routes and builds a multicast forwarding tree only in the multicast
topology. In this manner, the problem that multicast services heavily depend on unicast
routes is addressed.
Multicast Boundary
Multicast boundaries are used to control multicast information transmission by allowing
the multicast information of each multicast group to be transmitted only within a
designated scope. A multicast boundary can be configured on an interface to form a
closed multicast forwarding area. After a multicast boundary is configured for a specific
multicast group on an interface, the interface cannot receive or send multicast packets for
the multicast group.
1.11.6.2 Principles
1.11.6.2.1 RPF Check
Reverse path forwarding (RPF) check is a mechanism that determines whether a multicast
packet is valid. RPF check works as follows: After receiving a multicast packet, a router looks
up the packet source address in the unicast routing table, Multicast Border Gateway Protocol
(MBGP) routing table, Multicast Interior Gateway Protocol (MIGP) routing table, and
multicast static routing table to select an optimal route as an RPF route for the packet. If the
interface on which the packet has arrived is an RPF interface, the RPF check succeeds, and
the packet is forwarded. Otherwise, the RPF check fails, and the packet is dropped.
If all the MIGP, MBGP, and MSR routing tables have candidate routes for the RPF route, the
system selects one optimal route from each of the routing table. If the routes selected from
each table are Rt_urt (migp), Rt_mbgp, and Rt_msr, the system selects the RPF route based
on the following rules:
By default, the system selects the RPF route based on the route priority.
a. The system compares the priorities of Rt_urt (migp), Rt_mbgp, and Rt_msr. The
route with the smallest priority value is preferentially selected as the RPF route.
b. If Rt_urt (migp), Rt_mbgp, and Rt_msr have the same priority, the system selects
the RPF route in descending order of Rt_msr, Rt_mbgp, and Rt_urt (migp).
If the multicast longest-match command is run to control route selection based on the
route mask:
− The system compares the mask lengths of Rt_urt (migp), Rt_mbgp, and Rt_msr.
The route with the longest mask is preferentially selected as the RPF route.
− If Rt_urt (migp), Rt_mbgp, and Rt_msr have the same mask length, the system
compares their priorities. The route with the smallest priority value is preferentially
selected as the RPF route.
− If Rt_urt (migp), Rt_mbgp, and Rt_msr have the same mask length and priority, the
system selects the RPF route in descending order of Rt_msr, Rt_mbgp, and Rt_urt
(migp).
For example, on the network shown in Figure 1-966, Device C receives packets on both Port
1 and Port 2 from the same source. The routing table on Device C shows that the RPF
interface for this source is Port 2. Therefore, the RPF check fails for the packet on Port 1 but
succeeds for the packet on Port 2. Then, Device C drops the packet on Port 1 but forwards
that packet on Port 2.
Multicast group-based load splitting, multicast source-based load splitting, and multicast source- and
multicast group-based load splitting are all methods of hash mode load splitting.
Based on the hash algorithm, a multicast router can select a route among several equal-cost
routes for each multicast group. The routes are used for packet forwarding for the groups. As
a result, multicast traffic for different groups can be split into different forwarding paths.
Based on the hash algorithm, a multicast router can select a route among several equal-cost
routes for each multicast source. The routes are used for packet forwarding for the sources. As
a result, multicast traffic from different sources can be split into different forwarding paths.
Based on the hash algorithm, a multicast router can select a route among several equal-cost
routes for each source-specific multicast group. The routes are used for packet forwarding for
the source-specific multicast groups. As a result, multicast traffic for different source-specific
groups can be split into different forwarding paths.
entries for each equal-cost route. Then the router selects the route with the maximum
calculation result as the forwarding route for this new entry.
If an entry is deleted, the router does not adjust the entry load. Therefore, the router cannot
prevent unbalance of load splitting.
very busy. Network operators can set up another link Device E→Device D→Device A to
carry only multicast services and isolate multicast services from unicast services.
After a receiver sends a Join message to a multicast router, the multicast router performs
an RPF check based on the unicast route in the multicast topology and establishes an
MDT hop by hop. The multicast data then travels through the path Device E → Device D
→ Device A and reaches the receiver.
Usage Scenario
Multicast boundaries are used to control multicast information transmission by allowing the
multicast information of each multicast group to be transmitted only within a designated
scope. A multicast boundary can be configured on an interface to form a closed multicast
forwarding area. After a multicast boundary is configured for a specific multicast group on an
interface, the interface cannot receive or send multicast packets for the multicast group.
Principles
As shown in Figure 1-971, Device A, Device B, and Device C form multicast domain 1.
Device D, Device E, and Device F form multicast domain 2. The two multicast domains
communicate through Device B and Device D.
To isolate the data for a multicast group G from the other multicast domain, configure a
multicast boundary on GE 1/0/0 or GE 2/0/0 for group G. Then, the interface no longer
forwards data to and receives data from group G.
Definition
Multicast VPN (MVPN) in Rosen Mode is based on the multicast domain (MD) scheme
defined in relevant standards. MVPN in Rosen Mode implements multicast service
transmission over MPLS/BGP VPNs.
Purpose
MVPN in Rosen Mode transmits multicast data and control messages of PIM instances in a
VPN over a public network to remote sites of the VPN.
With MVPN in Rosen Mode, a public network PIM instance (called a PIM P-instance) does
not need to know multicast data transmitted in a PIM VPN instance (called a PIM C-instance),
and a PIM C-instance does not need to know multicast data transmitted in a PIM P-instance.
Therefore, MVPN in Rosen Mode isolates multicast data between a PIM P-instance and a
PIM C-instance.
1.11.7.2 Principles
1.11.7.2.1 Basic Concepts
MD
A multicast domain (MD) is composed of VPN instances on PEs that can receive and
send multicast data between each other. A PE VPN instance can belong only to one MD.
Different VPN instances belong to different MDs. An MD serves a specific VPN. All
private multicast data transmitted in the VPN is transmitted in the MD.
Share-group
A share-group is a group that all PE VPN instances in the same MD should join. A VPN
instance can join a maximum of one share-group.
Share-MDT
A share-multicast distribution tree (share-MDT) transmits PIM protocol packets and data
packets between PEs in the same VPN instance. A share-MDT is built when PIM
C-instances join share-groups.
MTI
A multicast tunnel interface (MTI) is the outbound or inbound interface of a multicast
tunnel (MT) or an MD. MTIs are used to transmit VPN data between local and remote
PEs.
An MTI is regarded as a channel through which the public network instance and a VPN
instance communicate. An MTI connects a PE to an MT on a shared network segment
and sets up PIM neighbor relationships between PE VPN instances in the same MD.
Switch-group
A switch-group is a group to which all VPN data receivers' PEs join. Switch-groups are
the basis of switch-MDT setup.
Switch-MDT
A switch-multicast distribution tree (switch-MDT) implements on-demand multicast data
transmission, so a switch-MDT transmits multicast data to only PEs that require the
multicast data. A switch-MDT can be built after a share-MDT is set up and VPN data
receivers' PEs join a switch-group.
The PIM C-instance on the local PE considers the MTI as a LAN interface and sets up a PIM
neighbor relationship with the remote PIM C-instance. The PIM C-instances then use the
MTIs to perform DR election, send Join/Prune messages, and transmit multicast data.
The PIM C-instances send PIM protocol packets or multicast data packets to the MTIs and the
MTIs encapsulate the received packets. The encapsulated packets are public network
multicast data packets that are forwarded by PIM P-instances. Therefore, an MT is actually a
multicast distribution tree on a public network.
VPNs use different MTs, and each MT uses a unique packet encapsulation mode, so
multicast data is isolated between VPNs.
PIM C-instances on PEs in the same VPN use the same MT and communicate through
this MT.
A VPN uniquely defines an MD. An MD serves only one VPN. This relationship is called a one-to-one
relationship. The VPN, MD, MTI, and share-group are all in a one-to-one relationship.
239.1.1.1) entry. PE 2 and PE 3 also send Register messages to the RP. Then, three
independent RP-source trees that connect PEs to the RP are built in the MD.
On the PIM-SM network, an RPT (*, 239.1.1.1) and three independent RP-source trees
construct a share-MDT.
Background
According to the process of establishing a Share-multicast distribution tree (Share-MDT)
described in the previous section, you can find that the VPN instance bound to PE3 has no
receivers but PE3 still receives the VPN multicast data packet of the group (192.1.1.1,
225.1.1.1). This is a defect of the multicast domain (MD) scheme: All the PEs belonging to
the same MD can receive multicast data packets regardless of whether they have receivers.
This wastes the bandwidth and imposes extra burden on PEs.
In MVPN, an optimized solution, Switch-MDT, is provided so that multicast data can be
transmitted on demand. It allows on-demand multicast transmission. Traffic will be switched
from the Share-MDT to the Switch-MDT if multicast traffic on PEs reaches the maximum.
Only the PEs that have receivers connected to them will receive multicast data from the
Switch-MDT. This reduces the stress on PEs and bandwidth consumption.
Implementation
Figure 1-978 shows the switch-MDT implementation process based on the assumption that a
share-MDT has been successfully established.
Background
Rosen MVPN supports only intra-VPN multicast service distribution. To enable a service
provider on a VPN to provide multicast services for users on other VPNs, use MVPN
extranet.
Implementation
Table 1-281 describes the usage scenarios of MVPN extranet.
The address range of multicast groups using the MVPN extranet service cannot overlap that of
multicast groups using the intra-VPN service.
Only a static RP can be used in an MVPN extranet scenario, the same static RP address must be
configured on the source and receiver VPN sides, and the static RP address must belong to the
source VPN. If different RP addresses are configured, inconsistent multicast routing entries will be
created on the two instances, causing service forwarding failures.
To provide an SSM service using MVPN extranet, the same SSM group address must be configured
on the source and receiver VPN sides.
Remote Cross
Configure a source VPN instance on a receiver PE
On the network shown in Figure 1-979, VPN GREEN is configured on PE1; PE1
encapsulates packets with the share-group G1 address; CE1 connects to the multicast
source in VPN GREEN. VPN BLUE is configured on PE2; PE2 encapsulates packets
with the share-group G2 address; CE2 connects to the multicast source in VPN BLUE.
VPN BLUE is configured on PE3; PE3 encapsulates packets with the share-group G2
address; PE3 establishes a multicast distribution tree (MDT) with PE2 on the public
network. Users connect to CE3 require to receive multicast data from both VPN BLUE
and VPN GREEN.
Configure source VPN GREEN on PE3 and a multicast routing policy for receiver VPN
BLUE. Table 1-282 describes the implementation process.
1 CE3 CE3 receives an IGMP Report message from the receiver that requires data
from the multicast source in VPN GREEN and forwards the Join message to
PE3 through PIM.
2 PE3 After PE3 receives the PIM Join message from CE3 in VPN BLUE, it
creates a multicast routing entry. Through the RPF check, PE3 determines
that the upstream interface of the RPF route belongs to VPN GREEN. Then,
PE3 adds the upstream interface (serving as an extranet inbound interface)
to the multicast routing table.
3 PE3 PE3 encapsulates the PIM Join message with the share-group G1 address of
VPN GREEN and sends the PIM Join message to PE1 in VPN GREEN over
the public network.
4 PE1 After PE1 receives the multicast data from the source in VPN GREEN, PE1
encapsulates the multicast data with the share-group G1 address of VPN
GREEN and sends the data to PE3 in VPN GREEN over the public
network.
5 PE3 PE3 decapsulates and imports the received multicast data to receiver VPN
BLUE and sends the data to CE3. Then, CE3 forwards the data to the
St Devi Description
ep ce
receiver in VPN BLUE.
Configure receiver VPN BLUE on PE1. No multicast routing policy is required. Table
1-283 describes the implementation process.
St Devi Description
ep ce
1 CE3 CE3 receives an IGMP Report message from the receiver that requires data
from the multicast source in VPN GREEN and forwards the Join message to
PE3 through PIM.
2 PE3 After PE3 receives the PIM Join message from CE3 in VPN BLUE, it
encapsulates the PIM Join message with the share-group G2 address of
St Devi Description
ep ce
VPN BLUE and sends the PIM Join message to PE1 in VPN BLUE over
the public network.
3 PE1 PE1 imports the PIM Join message for VPN BLUE to VPN GREEN,
establishes a multicast routing entry in VPN GREEN, and adds the extranet
outbound interface and receiver VPN BLUE to the multicast routing entry.
4 PE1 PE1 imports the multicast data sent by the multicast source in VPN GREEN
to receiver VPN BLUE, encapsulates the multicast data with the
share-group G2 address of VPN BLUE, and sends the data to PE3 in VPN
BLUE over the public network.
5 PE3 PE3 decapsulates and sends the received data to CE3. Then, CE3 forwards
the data to the receiver in VPN BLUE.
Local Cross
On the network shown in Figure 1-981, PE1 is the source PE of VPN BLUE. CE1 connects to
the multicast source in VPN BLUE. CE4 connects to the multicast source in VPN GREEN.
Both CE3 and CE4 reside on the same side of PE3. Users connect to CE3 require to receive
multicast data from both VPN BLUE and VPN GREEN.
Table 1-284 describes how MVPN extranet is implemented in the local crossing scenario.
St Devi Description
ep ce
1 CE3 CE3 receives an IGMP Report message from the receiver that requires data
from the multicast source in VPN GREEN and forwards the Join message to
PE3 through PIM.
2 PE3 After PE3 receives the PIM Join message, it creates a multicast routing
entry of VPN BLUE. Through the RPF check, PE3 determines that the
upstream interface of the RPF route belongs to VPN GREEN. PE3 then
imports the PIM Join message to VPN GREEN.
3 PE3 PE3 creates a multicast routing entry in VPN GREEN, records receiver
VPN BLUE in the entry, and sends the PIM Join message to CE4 in VPN
GREEN.
4 PE3 After CE4 receives the PIM Join message, it sends the multicast data from
VPN GREEN to PE3, PE3 imports the multicast data to receiver VPN
BLUE based on the multicast routing entries of VPN GREEN.
5 PE3 PE3 sends the multicast data to CE3 based on the multicast routing entries
of VPN BLUE. Then, CE3 forwards the data to the receiver in VPN BLUE.
In MVPN extranet scenarios where the multicast source resides on a public network and the receiver
resides on a VPN, static routes to the multicast source and public network RP must be configured in the
receiver VPN instance. After the device where the receiver VPN instance resides imports the PIM join
message from the VPN instance to the public network instance and establishes a multicast routing entry,
the device can send multicast data from the public network instance to the VPN instance, and then to the
receivers. Multicast protocol and data packets can be directly forwarded to the receiver without the need
to be encapsulated and decapsulated by GRE.
Background
Multicast packets, including protocol packets and data packets, are transmitted from the
public network to a private network along a public network multicast distribution tree (MDT).
Public network MDTs are categorized into the following types:
PIM-SM MDT: an MDT established by sending PIM-SM Join messages to the
intermediate device RP. PIM-SM MDTs are used in scenarios in which the location of
the multicast source (multicast tunnel interface) is unknown.
PIM-SSM MDT: an MDT established by sending PIM-SSM Join messages to the
multicast source. PIM-SSM MDTs are used in scenarios in which the location of the
multicast source (multicast tunnel interface) is known.
Before the BGP A-D MVPN is introduced, MD MVPNs can establish only PIM-SM MDTs.
This is because PEs belonging to the same VPN cannot detect each other's peer information.
As a result, PEs belonging to the same VPN cannot detect the multicast source, and therefore
are unable to send PIM-SSM Join messages to the multicast source to establish a PIM-SSM
MDT.
After the BGP A-D MVPN is introduced, MD MVPNs can also establish PIM-SSM MDTs.
On a BGP A-D MVPN, PEs obtain and record peer information about a VPN by exchanging
BGP Update packets that carry A-D route information. Then, these PEs send PIM-SSM Join
messages directly to the multicast source to establish a PIM-SSM MDT. After the PIM-SSM
MDT is established, the BGP A-D MVPN transmits multicast services over a public network
tunnel based on the PIM-SSM MDT.
Related Concepts
The concepts related to BGP A-D MVPN are as follows:
MD MVPN: See 1.11.7.2.1 Basic Concepts.
Peer: a BGP speaker that exchanges messages with another BGP speaker.
BGP A-D: a mechanism in which PEs exchange BGP Update packets that carry A-D
route information to obtain and record peer information of a VPN.
Implementation
For multicast VPN in BGP A-D mode, only MDT-SAFI A-D is supported, in which a new
address family is defined by BGP. In this manner, after VPN instance is configured on a PE,
the PE advertises the VPN configuration including the RD, share-group address, and IP
address of the MTI interface to all its BGP peers. After a remote PE receives an MDT-SAFI
message advertised by BGP, the remote PE compares the Share-Group address in the message
with its Share-Group address. If the remote PE confirms that it is in the same VPN as the
sender of the MDT-SAFI message, the remote PE establishes the PIM-SSM MDT on the
public network to transmit multicast VPN services.
As shown in Figure 1-982, PE1, PE2, and PE3 belong to VPN1, and join the share-group G1.
The address of G1 is within the SSM group address range. BGP MDT-SAFI A-D mode is
enabled on each PE. In addition, the BGP A-D function is enabled on VPN1. The site where
CE1 resides is connected to Source of VPN1, and CE2 and CE3 are connected to VPN1 users.
Based on the BGP A-D mechanism, every PE on the network obtains and records information
about all its BGP peers on the same VPN, and then directly establishes a PIM-SSM MDT on
the public network for transmitting multicast VPN services. In this manner, MVPN services
can be transmitted over a public network tunnel based on the PIM-SSM MDT.
The following uses PE3 as an example to describe service processing in MVPN in BGP A-D
mode:
1. After being configured with the BGP A-D function, PE1, PE2, and PE3 negotiate session
parameters, and confirm that both ends support the BGP A-D function. Then, the PEs
can establish BGP peer relationships. After receiving a BGP Update packet from PE1
and PE2, respectively, PE3 obtains and records the BGP peer addresses of PE1 and PE2.
The BGP Update packets carry the information about the PEs that send packets, such as
the PE address and supported tunnel type.
2. VPN1 is configured on PE3. PE3 joins the share-group G1. PE3 creates a PIM-SSM
entry with G1 being the group address and the address of PE1 being the source address
and another PIM-SSM entry with G1 being the group address and the address of PE2
being the source address. PE3 then directly sends PIM Join messages to PE1 and PE2 to
establish two PIM-SSM MDTs to PE1 and PE2, respectively.
3. CE3 sends a Join message to PE3. After receiving the Join message, PE3 encapsulates
the Join message with the PIM-SSM share-group address, and then sends it to PE1 over
the public network tunnel. PE1 then decapsulates the received Join message, and then
sends it to the multicast source.
4. After the multicast data sent by the multicast source reaches PE1, PE1 encapsulates the
multicast data with the share-group address, and then forwards it to PE3 over the public
network tunnel. PE3 then forwards the multicast data to CE3, and CE3 sends the
multicast data to the user.
The following example uses VPN BLUE to describe how multicast services are isolated
between VPNs.
1. After a share-multicast distribution tree (MDT) is established for the BLUE instances,
the two BLUE instances connected with CE 1 and CE 2 exchange multicast protocol
packets through a multicast tunnel (MT).
2. Multicast devices in the BLUE instances can then establish neighbor relationships, and
send Join, Prune, and BSR messages to each other. The protocol packets in the BLUE
instances are encapsulated and decapsulated only on the MTs of the PEs. The PEs are
unaware they are on VPN networks, so they process the multicast protocol packets and
forward multicast data packets like devices on a public network. Multicast data is
transmitted in the same MD, but isolated from VPN instances in other MDs.
Purpose
BGP/MPLS IP VPNs are widely deployed as they provide excellent reliability and security.
Meanwhile, IP multicast is gaining increasing popularity among service providers as it
provides highly efficient point-to-multipoint (P2MP) traffic transmission. Rapidly developing
multicast applications, such as IPTV, video conference, and distance education, impose
increasing requirements on network reliability, security, and efficiency. As a result, service
providers' demand for delivering multicast services over BGP/MPLS IP VPNs is also
increasing. In this context, the multicast virtual private network (MVPN) solution is
developed. The MVPN technology, when applied to a BGP/MPLS IP VPN, can transmit VPN
multicast traffic to remote VPN sites across the public network.
Rosen MVPNs establish multicast distribution trees (MDTs) using Protocol Independent
Multicast (PIM) to transmit VPN multicast protocol and data packets, and have the following
limitations:
VPN multicast protocol and data packets must be transmitted using the MDT, which
complicates network deployment because the multicast function must be enabled on the
public network.
The public network uses GRE for multicast packet encapsulation and cannot leverage the
MPLS advantages, such as high reliability, QoS guarantee, and TE bandwidth
reservation, of existing BGP/MPLS IP VPNs.
Next-generation (NG) MVPNs, which have made improvements over Rosen MVPNs, have
the following characteristics:
The public network uses BGP to transmit VPN multicast protocol packets and routing
information. Multicast protocols do not need to be deployed on the public network,
simplifying network deployment and maintenance.
The public network uses the mature label-based forwarding and tunnel protection
techniques of MPLS, improving multicast service quality and reliability.
Definition
The NG MVPN is a new framework designed to transmit IP multicast traffic across a
BGP/MPLS IP VPN. An NG MVPN uses BGP to transmit multicast protocol packets, and
uses PIM-SM, PIM-SSM, P2MP TE, or mLDP to transmit multicast data packets. The NG
MVPN enables unicast and multicast services to be delivered using the same VPN
architecture.
Figure 1-984 shows a typical NG MVPN networking scenario, and Table 1-285 lists the roles
of different entities on an NG MVPN.
Benefits
NG MVPNs, which implement hierarchical forwarding of multicast data and control packets
on BGP/MPLS IP VPNs, offer the following benefits:
Better security by transmitting VPN multicast data over BGP/MPLS IP VPNs.
Better network maintainability by reducing network deployment complexity.
Better service quality and reliability by using mature label-based forwarding and tunnel
protection techniques of MPLS.
PMSI Tunnel
Public tunnels (P-tunnels) are transport mechanisms used to forward VPN multicast traffic
across service provider networks. In NE20E, PMSI tunnels can be carried over RSVP-TE
P2MP or mLDP P2MP tunnels. Table 1-286 lists the differences between RSVP-TE P2MP
tunnels and mLDP P2MP tunnels.
Table 1-286 Differences between RSVP-TE P2MP tunnels and mLDP P2MP tunnels
RSVP-TE P2MP tunnel Established from the root RSVP-TE P2MP tunnels
node. support bandwidth
reservation and can ensure
service quality during
network congestion. Use
RSVP-TE P2MP tunnels to
carry PMSI tunnels if high
service quality is required.
mLDP P2MP tunnel Established from a leaf mLDP P2MP tunnels do not
node. support bandwidth
reservation and cannot
ensure service quality
during network congestion.
Configuring an mLDP
P2MP tunnel, however, is
easier than configuring an
RSVP-TE P2MP tunnel.
Use mLDP P2MP tunnels to
carry PMSI tunnels if high
service quality is not
required.
Theoretically, a P-tunnel can carry the traffic of one or multiple MVPNs. However, in NE20E,
a P-tunnel can carry the traffic of only one MVPN.
On an MVPN that uses BGP as the signaling protocol, a sender PE distributes information
about the P-tunnel in a new BGP attribute called PMSI. PMSI tunnels are the logical tunnels
used by the public network to transmit VPN multicast data, and P-tunnels are the actual
tunnels used by the public network to transmit VPN multicast data. A sender PE uses PMSI
tunnels to send specific VPN multicast data to receiver PEs. A receiver PE uses PMSI tunnel
information to determine which multicast data is sent by the multicast source on the same
MVPN as itself. There are two types of PMSI tunnels: I-PMSI tunnels and S-PMSI
tunnels.Table 1-287 lists the differences between I-PMSI and S-PMSI tunnels.
MVPN Targets
MVPN targets are used to control MVPN A-D route advertisement. MVPN targets function in
a similar way as VPN targets used on unicast VPNs and are also classified into two types:
Export MVPN target: A PE adds the export MVPN target to an MVPN instance before
advertising this route.
Import MVPN target: After receiving an MVPN A-D route from another PE, a PE
matches the export MVPN target of the route against the import MVPN targets of its
VPN instances. If the export MVPN target matches the import MVPN target of a VPN
instance, the PE accepts the MVPN A-D route and records the sender PE as an MVPN
member. If the export MVPN target does not match the import MVPN target of any VPN
instance, the PE drops the MVPN A-D route.
By default, if you do not configure MVPN targets for an MVPN, MVPN A-D routes carry the VPN
target communities that are attached to unicast VPN-IPv4 routes. If the unicast and multicast network
topologies are congruent, you do not need to configure MVPN targets for MVPN A-D routes. If they are
not congruent, configure MVPN targets for MVPN A-D routes.
On the network shown in Figure 1-985, PE1 and PE2 are both sender PEs, and PE3 is a
receiver PE. PE1 and PE2 connect to both vpn1 and vpn2. On PE1, the VRF Route
Import Extended Community is 1.1.1.9:1 for vpn1 and 1.1.1.9:2 for vpn2; on PE2, the
VRF Route Import Extended Community is 2.2.2.9:1 for vpn1 and 2.2.2.9:2 for vpn2.
After PE1 and PE2 both establish BGP MVPN peer relationships with PE3, PE1 and PE2
both send to PE3 a VPNv4 route destined for the multicast source 192.168.1.2. The VRF
Route Import Extended Community carried in the VPNv4 route sent by PE1 is 1.1.1.9:1
and that carried in the VPNv4 route sent by PE2 is 2.2.2.9:1. After PE3 receives the two
VPNv4 routes, PE3 adds the preferred route (VPNv4 route sent by PE1 in this example)
to the vpn1 routing table and stores the VRF Route Import Extended Community value
carried in the preferred route locally for later BGP C-multicast route generation.
Upon receipt of a PIM Join message from CE3, PE3 generates a BGP C-multicast route
with the RT-import attribute and sends this route to PE1 and PE2. The RT-import
attribute value of this route is the same as the locally stored VRF Route Import Extended
Community value, 1.1.1.9:1. Then,
− Upon receipt of the BGP C-multicast route, PE1 checks the RT-import attribute of
this route. After PE1 finds that the Administrator field value is 1.1.1.9, which is the
same as its local MVPN ID, PE1 accepts this route and adds it to the vpn1 routing
table based on the Local Administrator field value, 1.
− Upon receipt of the BGP C-multicast route, PE2 also checks the RT-import attribute
of this route. After PE2 finds that the Administrator field value is 1.1.1.9, a value
different from its local MVPN ID 2.2.2.9, PE2 drops this route.
To transmit multicast traffic from multicast sources to multicast receivers, sender PEs must
establish BGP MVPN peer relationships with receiver PEs. On the network shown in Figure
1-986, PE1 serves as a sender PE, and PE2 and PE3 serve as receiver PEs. Therefore, PE1
establishes BGP MVPN peer relationships with PE2 and PE3.
PEs on an NG MVPN use BGP Update messages to exchange MVPN information. MVPN
information is carried in the network layer reachability information (NLRI) field of a BGP
Update message. The NLRI containing MVPN information is also called the MVPN NLRI.
For more information about the MVPN NLRI, see MVPN NLRI.
For comparison between RSVP-TE and mLDP P2MP tunnels, see 1.11.8.2.1 in 1.11.8.2.1 Control
Plane Overview.
The following example uses the network shown in Figure 1-987 to describe how to establish
PMSI tunnels. Because RSVP-TE P2MP tunnels and mLDP P2MP tunnels are established
differently, the following uses two scenarios, RSVP-TE P2MP Tunnel and mLDP P2MP
Tunnel, to describe how to establish PMSI tunnels.
This example presumes that:
PE1 has established BGP MVPN peer relationships with PE2 and PE3, but no BGP
MVPN peer relationship is established between PE2 and PE3.
The network administrator has configured MVPN on PE1, PE2, and PE3 in turn.
Figure 1-988 Time sequence for establishing an I-PMSI tunnel with the P-tunnel type as
RSVP-TE P2MP LSP
Table 1-288 briefs the procedure for establishing an I-PMSI tunnel with the P-tunnel type as
RSVP-TE P2MP LSP.
Table 1-288 Procedure for establishing an I-PMSI tunnel with the P-tunnel type as RSVP-TE
P2MP LSP
PE3 joins the RSVP-TE P2MP tunnel rooted at PE1 in a similar way as PE2. After PE2 and
PE3 both join the RSVP-TE P2MP tunnel rooted at PE1, the I-PMSI tunnel is established and
the MVPN service becomes available.
Figure 1-989 Time sequence for establishing an I-PMSI tunnel with the P-tunnel type as mLDP
P2MP LSP
Table 1-289 briefs the procedure for establishing an I-PMSI tunnel with the P-tunnel type as
mLDP P2MP LSP.
Table 1-289 Procedure for establishing an I-PMSI tunnel with the P-tunnel type as mLDP P2MP
LSP
PE1 BGP and MVPN have As a sender PE, PE1 initiates the I-PMSI tunnel
been configured on establishment process. The MPLS module on
PE1. PE1 reserves resources (FEC information such as
PE1 has been the opaque value and root node address) for the
configured as a sender corresponding mLDP P2MP tunnel. Because PE1
PE. does not know leaf information of the mLDP
P2MP tunnel, the mLDP P2MP tunnel is not
The P-tunnel type for established in a real sense.
I-PMSI tunnel
establishment has been
specified as mLDP
P2MP LSP.
PE1 BGP and MVPN have PE1 sends a Type 1 BGP A-D route to PE2. This
been configured on route carries the following information:
PE2. 1.11.8.2.1 Control Plane Overview: used to
PE1 has established a control A-D route advertisement. The Type 1
BGP MVPN peer BGP A-D route carries the export MVPN
relationship with PE2. target configured on PE1.
1.11.8.4 NG MVPN Control Messages:
specifies the P-tunnel type (mLDP P2MP in
this case) used for PMSI tunnel establishment.
This attribute carries information about
resources reserved by MPLS for the mLDP
P2MP tunnel in Step .
PE2 - After PE2 receives the BGP A-D route from PE1,
the MPLS module on PE2 sends a Label
Mapping message to PE1. This is because the
PMSI Tunnel attribute carried in the received
route specifies the P-tunnel type as mLDP,
meaning that the P2MP tunnel must be
established from leaves.
After PE2 receives the MPLS message replied by
PE1, PE2 becomes aware that the P2MP tunnel
has been established. For more information about
mLDP P2MP tunnel establishment, see "mLDP"
in NE20E Feature Description - MPLS.
PE2 - PE2 creates an mLDP P2MP tunnel rooted at
PE1.
PE2 - PE2 sends a BGP A-D route that carries the
export MVPN target to PE1. Because PE2 is not
a sender PE configured with PMSI tunnel
information, the BGP A-D route sent by PE2
does not carry the PMSI Tunnel attribute.
After PE1 receives the BGP A-D route from PE2,
PE1 matches the export MVPN target of the route
against its local import MVPN target. If the two
PE3 joins the mLDP P2MP tunnel and MVPN in a similar way as PE2. After PE2 and PE3
both join the mLDP P2MP tunnel rooted at PE1, the I-PMSI tunnel is established and the
MVPN service becomes available.
Background
An NG MVPN uses the I-PMSI tunnel to send multicast data to receivers. The I-PMSI tunnel
connects to all PEs on the MVPN and sends multicast data to these PEs regardless of whether
these PEs have receivers. If some PEs do not have receivers, this implementation will cause
redundant traffic, wasting bandwidth resources and increasing PEs' burdens.
To solve this problem, S-PMSI tunnels are introduced. An S-PMSI tunnel connects to the
sender and receiver PEs of specific multicast sources and groups on an NG MVPN. Compared
with the I-PMSI tunnel, an S-PMSI tunnel sends multicast data only to PEs interested in the
data, reducing bandwidth consumption and PEs' burdens.
For comparison between I-PMSI and S-PMSI tunnels, see 1.11.8.2.1 Control Plane Overview in
1.11.8.2.1 .
Implementation
The following example uses the network shown in Figure 1-990 to describe switching
between I-PMSI and S-PMSI tunnels on an NG MVPN.
After multicast data is switched from the I-PMSI tunnel to an S-PMSI tunnel, if the S-PMSI tunnel
fails but the I-PMSI tunnel is still available, multicast data will be switched back to the I-PMSI
tunnel.
After multicast data is switched from the I-PMSI tunnel to an S-PMSI tunnel, if the multicast data
forwarding rate is consistently below the specified switching threshold but the I-PMSI tunnel is
unavailable, multicast data still travels along the S-PMSI tunnel.
Figure 1-991 Time sequence for switching from the I-PMSI tunnel to an RSVP-TE S-PMSI
tunnel
Table 1-291 Procedure for switching from the I-PMSI tunnel to an RSVP-TE S-PMSI tunnel
PE1 After PE1 detects that the multicast data forwarding rate exceeds the
specified switching threshold, PE1 initiates switching from the I-PMSI
tunnel to an S-PMSI tunnel by sending a BGP S-PMSI A-D route to its
After PE3 has downstream receivers, PE3 will send a BGP Leaf A-D route to PE1. Upon
receipt of the route, PE1 adds PE3 as a leaf node of the RSVE-TE S-PMSI tunnel. After
PE3 joins the tunnel, PE3's downstream receivers will also be able to receive multicast
data.
Switching from the I-PMSI Tunnel to an mLDP S-PMSI Tunnel
Figure 1-992 shows the time sequence for switching from the I-PMSI tunnel to an mLDP
S-PMSI tunnel. Table 1-292 describes the specific switching procedure.
Figure 1-992 Time sequence for switching from the I-PMSI tunnel to an mLDP S-PMSI
tunnel
Table 1-292 Procedure for switching from the I-PMSI tunnel to an mLDP S-PMSI tunnel
After PE3 has downstream receivers, PE3 will also directly join the mLDP S-PMSI
tunnel. Then, PE3's downstream receivers will also be able to receive multicast data.
PE1 starts a switch-delay timer upon the completion of S-PMSI tunnel establishment and determines
whether to switch multicast data to the S-PMSI tunnel as follows: If the S-PMSI tunnel fails to be
established, PE1 still uses the I-PMSI tunnel to send multicast data. If the multicast data forwarding rate
is consistently below the specified switching threshold throughout the timer lifecycle, PE1 still uses the
I-PMSI tunnel to transmit multicast data. If the multicast data forwarding rate is consistently above the
specified switching threshold throughout the timer lifecycle, PE1 switches data to the S-PMSI tunnel for
transmission.
Figure 1-993 Time sequence for switching from an S-PMSI tunnel to the I-PMSI tunnel
Table 1-293 Procedure for switching from an S-PMSI tunnel to the I-PMSI tunnel
Step Device Key Action
PE1 After PE1 detects that the multicast data forwarding rate is
consistently below the specified switching threshold, PE1 starts a
switchback hold timer:
If the multicast data forwarding rate is consistently above the
specified switching threshold throughout the timer lifecycle, PE1
still uses the S-PMSI tunnel to send traffic.
If the multicast data forwarding rate is consistently below the
specified switching threshold throughout the timer lifecycle, PE1
switches multicast data to the I-PMSI tunnel for transmission.
Meanwhile, PE1 sends a BGP Withdraw S-PMSI A-D route to
PE2 Upon receipt of the BGP Withdraw S-PMSI A-D route, PE2
withdraws the bindings between its multicast entries and the
S-PMSI tunnel. If PE2 has sent a BGP Leaf A-D route to PE1, PE2
will send a BGP Withdraw Leaf A-D route to PE1 in this step.
PE2 After PE2 detects that none of its multicast entries is bound to the
S-PMSI tunnel, PE2 leaves the S-PMSI tunnel.
PE1 PE1 deletes the S-PMSI tunnel after waiting for a specified period
of time.
In an RSVP-TE P2MP tunnel dual-root 1+1 protection scenario, S-PMSI tunnels must be carried over
RSVP-TE P2MP tunnels. The I-PMSI/S-PMSI switching processes in this scenario are similar to those
described above except that the leaf PEs need to start a tunnel status check delay timer:
Before the timer expires, leaf PEs delete tunnel protection groups to skip the status check of the
primary I-PMSI or S-PMSI tunnel. The leaf PEs select the multicast data received from the primary
tunnel and discards the multicast data received from the backup tunnel.
After the timer expires, leaf PEs start to check the primary I-PMSI or S-PMSI tunnel status again.
Leaf PEs select the multicast data received from the primary tunnel only if the primary tunnel is Up.
If the primary tunnel is Down, Leaf PEs select the multicast data received from the backup tunnel.
Figure 1-994 shows the procedure for joining a multicast group, and Table 1-294 describes
this procedure.
PE1 After PE1 receives a unicast route destined for the multicast source from
CE1, PE1 converts this route to a VPNv4 route, adds the Source AS
Extended Community and VRF Route Import Extended Community to this
route, and advertises this route to PE2.
For more information about the Source AS Extended Community and VRF
Route Import Extended Community, see MVPN Extended Community
Attributes.
PE2 After PE2 receives the VPNv4 route from PE1, PE2 matches the export
VPN target of the route against its local import VPN target:
If the two targets match, PE2 accepts the VPNv4 route and stores the
Source AS Extended Community and VRF Route Import Extended
Community values carried in this route locally for later generation of the
BGP C-multicast route.
If the two targets do not match, PE2 drops the VPNv4 route.
CE2 After CE2 receives an IGMP join request, CE2 sends a PIM-SSM Join
message to PE2.
PE2 After PE2 receives the PIM-SSM Join message:
PE2 generates a multicast entry. In this entry, the downstream interface
is the interface that receives the PIM-SSM Join message and the
CE1 After CE1 receives the PIM-SSM Join message, CE1 generates a multicast
entry. In this entry, the downstream interface is the interface that receives
the PIM-SSM Join message. After that, the multicast receiver successfully
joins the multicast group, and CE1 can send multicast traffic to CE2.
Figure 1-995 shows the procedure for leaving a multicast group, and Table 1-295 describes
this procedure.
CE2 CE2 detects that a multicast receiver attached to itself leaves the multicast
group.
PE2 PE2 deletes the corresponding multicast entry after this entry ages out.
Then, PE2 generates a BGP Withdraw message.
PE2 PE2 sends the BGP Withdraw message to PE1.
PE1 After PE1 receives the BGP Withdraw message, PE1 deletes the
corresponding multicast entry and generates a PIM-SSM Prune message.
PE1 PE1 sends the PIM-SSM Prune message to CE1.
CE1 After CE1 receives the PIM-SSM Prune message, CE1 stops sending
multicast traffic to CE2.
Table 1-296 Implementation modes of PIM (*, G) multicast joining and leaving
Not across the public PIM (*, G) PIM (*, G) entries The private network
network entries are are not transmitted RP can be deployed on
converted to across the public either a PE or a CE. If
PIM (S, G) network, lowering a CE serves as the
entries the performance private network RP,
before being requirements for the CE must establish
transmitted PEs. an MSDP peer
to remote The private relationship with the
PEs across network RP can be corresponding PE.
the public either a static RP
network. or a dynamic RP.
The advertisement of VPNv4 routes during multicast joining and leaving in PIM (*, G) mode is similar
to that in PIM (S, G) mode. For more information, see Table 1-294.
PIM (*, G) multicast joining and leaving across the public network
On the network show in Figure 1-996, CE3 serves as the RP. Figure 1-997 shows the
time sequence for establishing an RPT.
Figure 1-996 Networking for PIM (*, G) multicast joining and leaving
CE2 After CE2 receives an IGMP join request, CE2 sends a PIM (*, G) Join
message to PE2.
PE2 After PE2 receives the PIM (*, G) Join message: PE2 generates a PIM (*,
G) entry. In this entry, the downstream interface is the interface that
receives the PIM (*, G) Join message and the upstream interface is the
P2MP tunnel interface on the path to the RP. In this case, the upstream
interface is the interface used by PE3 to connect to PE2. PE2 generates a
BGP C-multicast route (Shared Tree Join route) based on the locally stored
Source AS Extended Community and VRF Route Import Extended
Community values. The RT-import attribute of this route is set to the
locally stored VRF Route Import Extended Community value. PE2 sends
the BGP C-multicast route to PE3, its BGP peer.
NOTE
For more information about BGP C-multicast route generation, see Table 1-294.
PE3 After PE3 receives the BGP C-multicast route (Shared Tree Join route):
1. PE3 checks the Administrator field and Local Administrator field values
in the RT-import attribute of the BGP C-multicast route. After PE3
confirms that the Administrator field value is the same as its local
MVPN ID, PE3 accepts the BGP C-multicast route.
2. PE3 determines to which VPN instance routing table should the BGP
C-multicast route be added based on the Local Administrator field value
in the RT-import attribute of the route.
3. PE3 adds the BGP C-multicast route to the corresponding VPN instance
routing table and creates a VPN multicast entry to guide multicast traffic
forwarding. In the multicast entry, the downstream interface is PE3's
P2MP tunnel interface.
4. PE3 converts the BGP C-multicast route to a PIM (*, G) Join message
and sends this message to CE3.
CE3 Upon receipt of the PIM (*, G) Join message, CE3 generates a PIM (*, G)
entry. In this entry, the downstream interface is the interface that receives
the PIM (*, G) Join message. Then, an RPT rooted at CE3 and with CE2 as
the leaf node is established.
CE1 After CE1 receives multicast traffic from the multicast source, CE1 sends a
PIM Register message to CE3.
CE3 Upon receipt of the PIM Register message, CE3 generates a PIM (S, G)
entry, which inherits the outbound interface of the previously generated
PIM (*, G) entry. Meanwhile, CE3 sends multicast traffic to PE3.
PE3 Upon receipt of the multicast traffic, PE3 generates a PIM (S, G) entry,
which inherits the outbound interface of the previously generated PIM (*,
G) entry. Because the outbound interface of the PIM (*, G) entry is an
P2MP tunnel interface, multicast traffic is imported to the I-PMSI tunnel.
PE2 Upon receipt of the multicast traffic, PE2 generates a PIM (S, G) entry,
which inherits the outbound interface of the previously generated PIM (*,
G) entry.
CE2 Upon receipt of the multicast traffic, CE2 sends the multicast traffic to
multicast receivers.
When the multicast traffic sent by the multicast source exceeds the threshold set on set,
CE2 initiates RPT-to-SPT switching. Figure 1-998 shows the time sequence for
switching an RPT to an SPT.
When the receiver PE receives multicast traffic transmitted along the RPT, the receiver PE immediately
initiates RPT-to-SPT switching. The RPT-to-SPT switching process on the receiver PE is similar to that
on CE2.
CE2 After the received multicast traffic exceeds the set threshold, CE2 initiates
RPT-to-SPT switching by sending a PIM (S, G) Join message to PE2.
PE2 Upon receipt of the PIM (S, G) Join message, PE2 updates the outbound
interface status in its PIM (S, G) entry, and switches the PIM (S, G) entry to
the SPT. Then, PE2 searches its multicast routing table for a route to the
multicast source. After PE2 finds that the upstream device on the path to
the multicast source is PE1, PE2 sends a BGP C-multicast route (Source
Tree Join route) to PE1.
PE1 Upon receipt of the BGP C-multicast route (Source Tree Join route), PE1
generates a PIM (S, G) entry, and sends a PIM (S, G) Join message to CE1.
CE1 Upon receipt of the PIM (S, G) Join message, CE1 generates a PIM (S, G)
entry. Then, the RPT-to-SPT switching is complete and CE1 can send
multicast traffic to PE1.
PE1 To prevent duplicate multicast traffic, PE1 carries the PIM (S, G) entry
information in a Source Active AD route and sends the route to all its BGP
peers.
PE3 Upon receipt of the Source Active AD route, PE3 records the route. After
RPT-to-SPT switching, PE3, the ingress of the P2MP tunnel for the RPT,
deletes received multicast traffic, generates the (S, G, RPT) state, and sends
a PIM (S, G, RPT) Prune to its upstream. Meanwhile, PE3 updates its VPN
multicast routing entries and stops forwarding multicast traffic.
NOTE
To prevent packet loss during RPT-to-SPT switching, the PIM (S, G, RPT) Prune
operation is performed after a short delay.
PE2 Upon receipt of the Source Active AD route, PE2 records the route.
Because the Source Active AD route carries information about the PIM (S,
G) entry for the RPT, PE2 initiates RPT-to-SPT switching. After PE2 sends
a BGP C-multicast route (Source Tree Join route) to PE1, PE2 can receive
multicast traffic from PE1.
Figure 1-999 shows the time sequence for leaving a multicast group in PIM (*, G) mode.
Figure 1-999 Time sequence for leaving a multicast group in PIM (*, G) mode
Table 1-299 describes the procedure for leaving a multicast group in PIM (*, G) mode.
Table 1-299 Procedure for leaving a multicast group in PIM (*, G) mode
CE2 After CE2 detects that a multicast receiver attached to itself leaves the
multicast group, CE2 sends a PIM (*, G) Prune message to PE2. If CE2 has
switched to the SPT, CE2 also sends a PIM (S, G) Prune message to PE2.
PE2 Upon receipt of the PIM (*, G) Prune message, PE2 deletes the
corresponding PIM (*, G) entry. Upon receipt of the PIM (S, G) Prune
message, PE2 deletes the corresponding PIM (S, G) entry.
PE2 PE2 sends a BGP Withdraw message (Shared Tree Join route) to PE3 and a
BGP Withdraw message (Source Tree Join route) to PE1.
PE1 Upon receipt of the BGP Withdraw message (Source Tree Join route), PE1
deletes the previously recorded BGP C-multicast route (Source Tree Join
route) as well as the outbound interface in the PIM (S, G) entry.
PE3 Upon receipt of the BGP Withdraw message (Shared Tree Join route), PE3
deletes the previously recorded BGP C-multicast route (Shared Tree Join
route) as well as the outbound interface in the PIM (S, G) entry.
PIM (*, G) multicast joining and leaving not across the public network
On the network show in Figure 1-996, each site of the MVPN is a PIM-SM BSR domain.
A PE serves as the RP. Figure 1-1000 shows the time sequence for joining a multicast
group when a PE serves as the RP.
Figure 1-1000 Time sequence for joining a multicast group when a PE serves as the RP
Table 1-300 describes the procedure for joining a multicast group when a PE serves as
the RP.
Table 1-300 Procedure for joining a multicast group when a PE serves as the RP
CE2 After CE2 receives an IGMP join request, CE2 sends a PIM (*, G) Join
message to PE2.
PE2 Upon receipt of the PIM (*, G) Join message, PE2 generates a PIM (*, G)
entry. Because PE2 is the RP, PE2 does not send the BGP C-multicast route
(Shared Tree Join route) to other devices. Then, an RPT rooted at PE2 and
with CE2 as the leaf node is established.
CE1 After CE1 receives multicast traffic from the multicast server, CE1 sends a
PIM Register message to PE1.
PE1 Upon receipt of the PIM Register message, PE1 generates a PIM (S, G)
entry.
PE1 PE1 sends a Source Active AD route to all its BGP peers.
PE2 Upon receipt of the Source Active AD route, PE2 generates a PIM (S, G)
entry, which inherits the outbound interface of the previously generated PIM
(*, G) entry.
PE2 PE2 initiates RPT-to-SPT switching and sends a BGP C-multicast route
(Source Tree Join route) to PE1.
PE1 Upon receipt of the BGP C-multicast route (Source Tree Join route), PE1
imports multicast traffic to the I-PMSI tunnel based on the corresponding
VPN multicast forwarding entry. Then, multicast traffic is transmitted over
the I-PMSI tunnel to CE2.
Figure 1-1001 shows the time sequence for leaving a multicast group when a PE serves
as the RP.
Figure 1-1001 Time sequence for leaving a multicast group when a PE serves as the RP
Table 1-301 describes the procedure for leaving a multicast group when a PE serves as
the RP.
Table 1-301 Procedure for leaving a multicast group when a PE serves as the RP
CE2 After CE2 detects that a multicast receiver attached to itself leaves the
multicast group, CE2 sends a PIM (*, G) Prune message to PE2.
PE2 Upon receipt of the PIM (*, G) Prune message, PE2 deletes the
corresponding PIM (*, G) entry.
CE2 CE2 sends a PIM (S, G) Prune message to PE2.
PE2 Upon receipt of the PIM (S, G) Prune message, PE2 deletes the
corresponding PIM (S, G) entry. PE2 sends a BGP Withdraw message
(Source Tree Join route) to PE1.
PE1 Upon receipt of the BGP Withdraw message (Source Tree Join route), PE1
deletes the previously recorded BGP C-multicast route (Source Tree Join
route) as well as the outbound interface in the PIM (S, G) entry.
Meanwhile, PE1 sends a PIM (S, G) Prune message to CE1.
CE1 Upon receipt of the PIM (S, G) Prune message, CE1 stops sending
multicast traffic to CE2.
On the network show in Figure 1-996, each site of the MVPN is a PIM-SM BSR domain.
A CE serves as the RP. CE3 has established an MSDP peer relationship with PE3, and
PE2 has established an MSDP peer relationship with CE2. Figure 1-1002 shows the time
sequence for joining a multicast group when a CE serves as the RP.
Figure 1-1002 Time sequence for joining a multicast group when a CE serves as the RP
Table 1-302 describes the procedure for joining a multicast group when a CE serves as
the RP.
Table 1-302 Procedure for joining a multicast group when a CE serves as the RP
CE2 After CE2 receives an IGMP join request, CE2 generates a PIM (*, G) Join
message. Because CE2 is the RP, CE2 does not send the PIM (*, G) Join
message to its upstream.
CE1 After CE1 receives multicast traffic from the multicast server, CE1 sends a
PIM Register message to CE3.
CE3 Upon receipt of the PIM Register message, CE3 generates a PIM (S, G)
entry.
CE3 CE3 carries the PIM (S, G) entry information in an MSDP Source Active
(SA) message and sends the message to its MSDP peer, PE3.
PE3 Upon receipt of the MSDP SA message, PE3 generates a PIM (S, G) entry.
PE3 PE3 carries the PIM (S, G) entry information in a Source Active AD route
and sends the route to other PEs.
PE2 Upon receipt of the Source Active AD route, PE2 learns the PIM (S, G)
entry information carried in the route. Then, PE2 sends an MSDP SA
message to transmit the PIM (S, G) entry information to its MSDP peer,
CE2.
CE2 Upon receipt of the MSDP SA message, CE2 learns the PIM (S, G) entry
information carried in the message and generates a PIM (S, G) entry. Then,
CE2 initiates a PIM (S, G) join request to the multicast source. Finally, CE2
forwards the multicast traffic to multicast receivers.
Figure 1-1003 shows the time sequence for leaving a multicast group when a CE serves
as the RP.
Figure 1-1003 Time sequence for leaving a multicast group when a CE serves as the RP
Table 1-303 describes the procedure for leaving a multicast group when a CE serves as
the RP.
Table 1-303 Procedure for leaving a multicast group when a CE serves as the RP
PE2 Upon receipt of the PIM (S, G) Prune message, PE2 deletes the
corresponding PIM (S, G) entry. Then, PE2 sends a BGP Withdraw
message (Shared Tree Join route) to PE1.
PE1 Upon receipt of the BGP Withdraw message (Source Tree Join route), PE1
deletes the previously recorded BGP C-multicast route (Source Tree Join
route) as well as the outbound interface in the PIM (S, G) entry.
Meanwhile, PE1 sends a PIM (S, G) Prune message to CE1.
CE1 Upon receipt of the PIM (S, G) Prune message, CE1 stops sending
multicast traffic to CE2.
MVPN NLRI
A PE that participates in an NG MVPN is required to send a BGP Update message containing
the MVPN NLRI. The SAFI of the MVPN NLRI is 5. Figure 1-1007 shows the MVPN NLRI
format.
Field Description
Route type Type of an MVPN route. Seven types of MVPN routes are
specified. For more information, see Table 1-306.
Length Length of the Route Type specific field of the MVPN
NLRI.
Route type specific MVPN route information. The value of this field depends
on the Route Type field. For more information, see Table
1-306.
Table 1-306 describes the types and functions of MVPN routes. Type 1-5 routes are called
MVPN A-D routes. These routes are used for MVPN membership autodiscovery and P2MP
tunnel establishment. Type 6 and Type 7 routes are called C-multicast routes (C is short for
Customer. C-multicast routes refe to multicast routes from the private network). These routes
are used for VPN multicast joining and VPN multicast traffic forwarding.
Table 1-307 Description of the fields for an Intra-AS I-PMSI A-D route
Field Description
Field Description
Field Description
RD Route distinguisher, an 8-byte field in a VPNv4 address. An
RD and a 4-byte IPv4 address prefix form a VPNv4 address,
which is used to differentiate IPv4 prefixes using the same
address space.
Multicast source length Length of the multicast source address. The value is 32 if
the multicast source address is an IPv4 address or 128 if the
multicast source address is an IPv6 address.
Multicast source Multicast source address.
Multicast group length Length of the multicast group address. The value is 32 if the
multicast group address is an IPv4 address or 128 if the
multicast group address is an IPv6 address.
Multicast group Multicast group address.
Originating router's IP IP address of the router that originates the A-D route. In
address NE20E, the value is equal to the MVPN ID of the router
that originates the A-D route.
Field Description
Route key Set to the MVPN NLRI of the S-PMSI A-D route received.
Originating router's IP IP address of the router that originates the A-D route. In
NE20E, the value is equal to the MVPN ID of the router
Field Description
address that originates the A-D route.
Table 1-310 Description of the fields for a Source Active A-D route
Field Description
RD RD of the sender PE connected to the multicast source.
Multicast source length Length of the multicast source address. The value is 32 if
the multicast source address is an IPv4 address or 128 if the
multicast source address is an IPv6 address.
Multicast source Multicast source address.
Multicast group length Length of the multicast group address. The value is 32 if the
multicast group address is an IPv4 address or 128 if the
multicast group address is an IPv6 address.
Multicast group Multicast group address.
Table 1-311 Description of the fields for a Shared Tree Join route
Field Description
Route type MVPN route type. The value 6 indicates that the route is a
Type 6 route (Shared Tree Join route).
Rt-import VRF Route Import Extended Community of the unicast
route to the multicast source. For more information about
the VRF Route Import Extended Community, see MVPN
Extended Communities.
The VRF Route Import Extended Community is used by
sender PEs to determine whether to process the BGP
C-multicast route sent by a receiver PE. This attribute also
helps a sender PE to determine to which VPN instance
routing table a BGP C-multicast route should be added.
Next hop Next hop address.
RD RD of the sender PE connected to the multicast source.
Source AS Source AS Extended Community of the unicast route to the
multicast source. For more information about the Source AS
Extended Community, see MVPN Extended Communities.
Multicast source length Length of the multicast source address. The value is 32 if
the multicast group address is an IPv4 address or 128 if the
multicast group address is an IPv6 address.
RP address Rendezvous Point (Rendezvous Point) address.
Multicast group length Length of the multicast group address. The value is 32 if the
multicast group address is an IPv4 address or 128 if the
multicast group address is an IPv6 address.
Multicast group Multicast group address.
Table 1-312 Description of the fields for a Source Tree Join route
Field Description
RD RD of the sender PE connected to the multicast source.
Source AS Source AS Extended Community of the unicast route to the
multicast source. For more information about the Source AS
Extended Community, see MVPN Extended Communities.
Multicast source length Length of the multicast source address. The value is 32 if
the multicast group address is an IPv4 address or 128 if the
multicast group address is an IPv6 address.
Multicast source Multicast source address.
Multicast group length Length of the multicast group address. The value is 32 if the
multicast group address is an IPv4 address or 128 if the
multicast group address is an IPv6 address.
Multicast group Multicast group address.
Field Description
Flags Flags bits. Currently, only one flag indicating whether leaf
information is required is specified:
If the PMSI Tunnel attribute carried with a Type 3 route
has its Flags bit set to Leaf Information Not Required,
the receiver PE that receives the Type 3 route does not
need to respond.
If the PMSI Tunnel attribute carried with a Type 3 route
has its Flags bit set to Leaf Information Required, the
receiver PE that receives the Type 3 route needs to send
a Leaf A-D route in response.
Field Description
node address, Opaque value>.
On an NG MVPN, the sender PE sets up the P-tunnel, and therefore is responsible for
originating the PMSI Tunnel attribute. The PMSI Tunnel attribute can be attached to Type 1-3
routes and sent to receiver PEs. Figure 1-1015 is an example shows the format of an Intra-AS
I-PMSI A-D route carrying the PMSI Tunnel attribute.
Figure 1-1015 Intra-AS I-PMSI A-D route carrying the PMSI Tunnel attribute
Dual-root 1+1 Sender PEs (P-tunnels can also Advantage: The network uses
protection be protected after this solution is BFD to detect link faults,
deployed) implementing fast route
convergence and high network
reliability.
Disadvantages:
Redundant multicast traffic
exists on the network, wasting
bandwidth resources.
Only sender PEs and
P-tunnels can be protected.
Receiver PEs and CEs cannot
be protected.
Table 1-315 Possible points of failure and corresponding network convergence processes
1 CE1 or The network can rely only on unicast route convergence for recovery. The
link handling process is as follows:
between 1. PE1 detects that the multicast source is unreachable.
PE1
2. PE1 sends to PE3 a BGP Withdraw message that carries information
and the
about a VPNv4 route to the source.
multica
st 3. After PE3 receives the message, PE3 preferentially selects the route
source advertised by PE2 as the route to the multicast source. Then, PE3 sends
a BGP C-multicast route to PE2. Upon receipt, PE2 converts the route
to a PIM Join message and sends the message to CE2.
4. CE2 constructs an MDT and sends the multicast traffic received from
the multicast source to PE2. Upon receipt, PE2 sends the traffic to PE3
over the P2MP tunnel.
5. After PE3 receives the traffic, PE3 sends the traffic to CE3, which in
turn sends the traffic to the multicast receiver.
2 PE1 The network can rely only on unicast route convergence for recovery. The
handling process is as follows:
1. After PE3 uses BFD for BGP to detect that PE1 is unreachable, PE3
withdraws the route (to the multicast source) advertised by PE1 and
preferentially selects the route advertised by PE2 as the route to the
multicast source. Then, PE3 sends a BGP C-multicast route to PE2.
2. After PE2 receives the route, PE2 converts the route to a PIM Join
message and sends the message to CE2.
3. CE2 constructs an MDT and sends the multicast traffic received from
the multicast source to PE2. Upon receipt, PE2 sends the traffic to PE3
over the P2MP tunnel.
4. After PE3 receives the traffic, PE3 sends the traffic to CE3, which in
turn sends the traffic to the multicast receiver.
3 Public If MPLS tunnel protection is configured, the network relies on MPLS
network tunnel protection for recovery. The MVPN is unaware of public network
link link changes. If MPLS tunnel protection is not configured, the network
relies on unicast route convergence for recovery. In this situation, the
handling process is similar to the process for handling PE1 failures.
4 PE3 The network can rely only on unicast route convergence for recovery. The
handling process is as follows:
1. When CE3 detects that PE3 is unreachable, CE3 withdraws the unicast
route (to the multicast source) advertised by PE3 to trigger route
convergence. During route convergence, CE3 preferentially selects the
route advertised by PE4 as the route to the multicast source.
2. CE3 sends a PIM Join message to PE4.
3. After PE4 receives the message, PE4 converts the message to a BGP
C-multicast route and sends the route to PE1.
4. After PE1 receives the route, PE1 converts the route to a PIM Join
message and sends the message to CE1.
5. CE1 constructs an MDT and sends the multicast traffic received from
the multicast source to PE1. Upon receipt, PE1 sends the traffic to PE4
In single-MVPN networking protection, if PE3 and PE4 both receive PIM Join messages but
their upstream peers are different (for example, the upstream peer is PE1 for PE3 and PE2 for
PE4), PE1 and PE2 both send multicast traffic to PE3 and PE4. In this situation, you must
ensure that PE3 accepts only the multicast traffic from PE1 and PE4 accepts only the
multicast traffic from PE2. To do so, you must create multiple P2MP tunnels (with each
I-PMSI tunnel corresponds to one P2MP tunnel) when configuring a receiver PE to join
multiple I-PMSI tunnels. Then, when the multicast traffic reaches the receiver PE over
multiple I-PMSI tunnels, the receiver PE can identify the P2MP tunnel corresponding to the
upstream neighbor in its VPN instance multicast routing table. The receiver PE permits traffic
only in the identified P2MP tunnel but discards traffic in all other tunnels.
CE3 serves as a DR. After CE3 receives an IGMP join request from a multicast receiver,
CE3 sends a PIM Join message to PE3. Upon receipt, PE3 converts the message to a
BGP C-multicast route and sends the route to PE1, its BGP MVPN peer. Upon receipt,
PE1 converts the BGP C-multicast route to a PIM Join message and sends the message to
CE1. Upon receipt, CE1 establishes an MDT. Then, multicast traffic can be transmitted
from the multicast source to the multicast receiver along the path CE1 -> PE1 -> P1 ->
PE3 -> CE3.
CE4 serves as a non-DR. After CE4 receives an IGMP join request from a multicast
receiver, CE4 does not send a PIM Join message to its upstream. To implement traffic
redundancy, configure static IGMP joining on CE4, so that CE4 can send a PIM Join
message to PE4. After PE4 receives the message, PE4 converts the message to a BGP
C-multicast route and sends the route to PE2. Upon receipt, PE2 converts the route to a
PIM Join message and sends the message to CE2. Upon receipt, CE2 establishes an
MDT. Then, multicast traffic can be transmitted along the path CE2 -> PE2 -> P2 -> PE4
-> CE4. The multicast traffic will not be forwarded to receivers because CE4 is a
non-DR.
Table 1-316 Possible points of failure and corresponding network convergence processes
No Point Network Convergence Process
. of
Failure
1 CE1 or The network relies on unicast route convergence for recovery. The
link
1.11.8.6 Applications
1.11.8.6.1 Application of NG MVPN to IPTV Services
Overview
Multicast services, such as IPTV services, video conferences, and real-time multi-player
online games, are increasingly used in daily life. These services are transmitted over service
bearer networks that need to:
Forward multicast traffic smoothly even during traffic congestion.
Detect network faults in a timely manner and quickly switch traffic from faulty links to
normal links.
Ensure multicast traffic security in real time.
Networking Description
NG MVPN is deployed on the service provider's backbone network to solve multicast service
issues related to traffic congestion, transmission reliability, and data security. Figure 1-1019
shows the application of NG MVPN to IPTV services.
Feature Deployment
In this scenario, NG MVPN deployment consists of the following aspects:
On the control plane
− Configure a BGP/MPLS IP VPN on the service provider's backbone network and
ensure that this VPN runs properly.
− Configure MVPN on the service provider's backbone network, so that PEs
belonging to the same MVPN can use BGP to exchange BGP A-D and BGP
C-multicast routes.
− Configure P2MP tunnels on the service provider's backbone network.
Terms
Term Definition
BFD Bidirectional Forwarding Detection. A common fault detecting mechanism
that uses Hello packets to quickly detect a link status change and notify a
protocol of the change. The protocol then determines whether to establish or
tear down a peer relationship.
DR Designated router. A router that applies only to PIM-SM. On the network
segment that connects to a multicast source, a DR sends Register messages to
the RP. On the network segment that connects to multicast receivers, a DR
sends Join messages to the RP. In SSM mode, a DR at the group member side
directly sends Join messages to a multicast source.
IGMP Internet Group Management Protocol. A signaling mechanism that
implements communication between hosts and routers on IP multicast leaf
networks.
By periodically sending IGMP messages, a host joins or leaves a multicast
group, and a router identifies whether a multicast group contains members.
Join A type of message used on PIM-SM networks. When a host requests to join a
network segment, the DR of the network segment sends a Join message to the
RP hop by hop to generate a multicast route. When the RP starts an SPT
switchover, the RP sends a Join message to the source hop by hop to generate
a multicast route.
PIM Protocol Independent Multicast. A multicast routing protocol.
Reachable unicast routes are the basis of PIM forwarding. PIM uses the
existing unicast routing information to perform RPF check on multicast
packets to create multicast routing entries and set up an MDT.
Prune A type of message. If there are no multicast group members on a downstream
interface, a router sends a prune message to the upstream node. After
receiving the prune message, the upstream node removes the downstream
interface from the downstream interface list and stops forwarding data of the
specified group to the downstream interface.
P-tunnel A public network tunnel used to transmit VPN multicast traffic. A P-tunnel
can be established using GRE, MPLS, or other tunneling technologies.
Term Definition
PMSI A logical tunnel used by a public network to transmit VPN multicast traffic. A
sender PE transmits VPN multicast traffic to receiver PEs over a PMSI
tunnel. Receiver PEs determine whether to accept the VPN multicast traffic
based on PMSI tunnel information. PMSI tunnels are categorized as I-PMSI
or S-PMSI tunnels.
RD Route distinguisher. An 8-byte field in a VPN IPv4 address. An RD together
with a 4-byte IPv4 address prefix constructs a VPN IPv4 address to
differentiate the IPv4 prefixes using the same address space.
receiver A site where multicast receivers reside.
site
receiver A PE connected to a receiver site.
PE
sender site A site where a multicast source resides.
sender PE A PE connected to a sender site.
(S, G) A multicast routing entry. S indicates a multicast source, and G indicates a
multicast group. After a multicast packet with S as the source address and G
as the group address reaches a router, it is forwarded through the downstream
interfaces of the (S, G) entry. The packet is expressed as an (S, G) packet.
(*, G) A PIM routing entry. * indicates any source, and G indicates a multicast
group. The (*, G) entry applies to all multicast packets whose group address
is G. All multicast packets that are sent to G are forwarded through the
downstream interfaces of the (*, G) entry, regardless of which source sends
the packets.
tunnel ID A group of information, including token, slot number of an outgoing
interface, tunnel type.
VPN Virtual private network. A technology that implements a private network over
a public network.
VPN An entity that is set up and maintained by the PE devices for
instance directly-connected sites. Each site has its VPN instance on a PE device. A
VPN instance is also called a VPN routing and forwarding (VRF) table. A PE
device has multiple forwarding tables, including a public-network routing
table and one or multiple VRF tables.
VPN A BGP extended community attribute that is also called Route Target. In
Target BGP/MPLS IP VPN, VPN Target controls VPN routing information. VPN
Target defines a VPN-IPv4 route can be received by which site and a PE
device can receive routes from which site.
MVPN Control MVPN A-D route advertisement. MVPN Target functions in a similar
Target way as VPN Target on unicast VPNs.
A-D autodiscovery
AS autonomous system
BGP Border Gateway Protocol
CE customer edge
C-G customer multicast group address
C-S customer multicast source address
FRR fast reroute
LDP Label Distribution Protocol
mLDP Multipoint LDP
MPLS Multiprotocol Label Switching
MVPN multicast VPN
NG MVPN next-generation multicast VPN
NLRI network layer reachability information
P2MP point-to-multipoint
P provider (device)
PE provider edge
PIM Protocol Independent Multicast
PIM-SM Protocol Independent Multicast-Sparse Mode
RP rendezvous point
RPF reverse path forwarding
RSVP Resource Reservation Protocol
SSM source-specific multicast
TE traffic engineering
VPN virtual private network
1.11.9 MLD
1.11.9.1 Introduction
Definition
MLD manages IPv6 multicast members. MLD sets up and maintains member relationships
between IPv6 hosts and the multicast router to which the hosts are directly connected.
MLD has two versions: MLDv1 and MLDv2. Both MLD versions support the ASM model.
MLDv2 supports the SSM model independently, while MLDv1 needs to work with SSM
mapping to support the SSM model.
MLD applies to IPv6 and provides the similar functions as the IGMP for IPv4. MLDv1 is
similar to IGMPv2, and MLDv2 is similar to IGMPv3.
Some features of MLD and IGMP are implemented in the same manner. The following
common features of MLD and IGMP are not mentioned:
MLD Router-Alert option
MLD Prompt-Leave
MLD static-group
MLD group-policy
MLD SSM mapping
This section describes MLD principles and unique features of MLD, including the MLD
querier election mechanism and MLD group compatibility.
Configuring an ACL filtering rule is mandatory for source address-based MLD message
filtering, while is optional for source address-based IGMP message filtering.
Purpose
MLD allows hosts to dynamically join IPv6 multicast groups and manages multicast group
members. MLD is configured on the multicast router to which hosts are directly connected.
1.11.9.2 Principles
1.11.9.2.1 MLDv1 and MLDv2
By sending Query messages to hosts and receiving Report messages and Done messages from
hosts, a multicast router can identify multicast groups that contain receivers on a network
segment. A multicast router forwards multicast data to a network segment only if the network
segment has receivers. Hosts can determine whether to join or leave a multicast group.
As shown in Figure 1-1020, MLD-enabled Device A functions as the querier to periodically
send Multicast Listener Query messages. All hosts (Host A, Host B, and Host C) on the same
network segment of Device A can receive these Multicast Listener Query messages.
When a host (for example, Host A) receives a Multicast Listener Query message of
group G, the processing flow is as follows:
If Host A is already a member of group G, Host A replies with a Multicast Listener
Report message of group G at a random time point within the response period specified
by Device A.
After receiving the Multicast Listener Report message, Device A records information
about group G and forwards the multicast data to the network segment of the host
interface that is directly connected to Device A. Meanwhile, Device A starts a timer for
group G or resets the timer if it has been started. If no members of group G respond to
Device A within the interval specified by the timer, Device A stops forwarding the
multicast data of group G.
If Host A is not a member of any multicast group, Host A does not respond to the
Multicast Listener Query message from Device A.
When a host (for example, Host A) joins a multicast group G, the processing flow is as
follows:
Host A sends a Multicast Listener Report message of group G to Device A, instructing
Device A to update its multicast group information. Subsequent Multicast Listener
Report messages of group G are triggered by Multicast Listener Query messages sent by
Device A.
When a host (for example, Host A) leaves a multicast group G, the processing flow is as
follows:
Host A sends a Multicast Listener Done message of group G to Device A. After receiving
the Multicast Listener Done message, Device A triggers a query on group G to check
whether group G has other receivers. If Device A does not receive Multicast Listener
Report messages of group G within the period specified by the query message, Device A
deletes the information about group G, and stops forwarding the multicast traffic of
group G.
MLDv1 is capable of suppressing report messages to reduce repetitive report messages. This
function works as follows:
After a host (for example, Host A) joins a multicast group G, Host A receives a Multicast
Listener Query message from a router and then randomly selects a value from 0 to Maximum
Response Delay (specified in the Multicast Listener Query message) as the timer value. When
the timer expires, Host A sends a Multicast Listener Report message of group G to the router.
If Host A receives a Multicast Listener Report message of group G from another host in group
G before the timer expires, Host A does not send the Multicast Listener Report message of
group G to the router.
When a host leaves group G, the host sends a Multicast Listener Done message of group G to
a router. Because of the Report message suppression mechanism in MLDv1, the router cannot
determine whether another host exists in group G. Therefore, the router triggers a query on
group G. If another host exists in group G, the host sends a Multicast Listener Report message
of group G to the router.
If a router sends the query on group G for a specified number of times, but does not receive a
Multicast Listener Report message for group G, the router deletes information about group G
and stops forwarding multicast data of group G.
Both MLD queriers and non-queriers can process Multicast Listener Report messages, while only
queriers are responsible for forwarding Multicast Listener Query messages. MLD non-queriers cannot
process Multicast Listener Done messages of MLDv1.
ALLOW_NEW_SOURCES: indicates that a host still wants to receive data from certain
multicast sources. If the current relationship is Include, certain sources are added to the
current source list. If the current relationship is Exclude, certain sources are deleted from
the current source list.
BLOCK_OLD_SOURCES: indicates that a host does not want to receive data from
certain multicast sources any longer. If the current relationship is Include, certain sources
are deleted from the current source list. If the current relationship is Exclude, certain
sources are added to the current source list.
On the router side, the querier sends Multicast Listener Query messages and receives
Multicast Listener Report. In this manner, the router can identify which multicast group on the
network segment contains receivers, and then forwards the multicast data to the network
segment accordingly. In MLDv2, records of multicast groups can be filtered in either Include
mode or Exclude mode.
In Include mode:
− The multicast source in the activated state requires the router to forward its data.
− The multicast source in the deactivated state is deleted by the router and data
forwarding for the multicast source is ceased.
In Exclude mode:
− The multicast source in the activated state is in the collision domain. That is, no
matter whether hosts on the same network segment of the router interface require
the data of the multicast source, the data is forwarded.
− The multicast source in the deactivated state requires no data forwarding.
− Data of the multicast source that is not recorded in the multicast group should be
forwarded.
A non-querier only receives Multicast Listener Report messages from hosts to learn
which multicast group has receivers. Then, based on the querier's action, the non-querier
identifies which receivers leave multicast groups.
Generally, a network segment has only one querier. Multicast devices follow the same
principle to select a querier. The process is as follows (using Device A, Device B, and Device
C as examples):
After MLD is enabled on Device A, Device A considers itself a querier in the startup
process by default and sends Multicast Listener Query messages on the network segment.
If Device A receives a Multicast Listener Query message from Device B that has a lower
link-local address, Device A changes from a querier to a non-querier. Device A starts the
another-querier-existing timer and records Device B as the querier of the network
segment.
If Device A is a non-querier and receives a Multicast Listener Query message from
Device B in the querier state, Device A updates another-querier-existing timer; if the
received Multicast Listener Query message is sent from Device C whose link-local
address is lower than that of Device B in the querier state, Device A records Device C as
the querier of the network segment and updates the another-querier-existing timer.
If Device A is a non-querier and the another-querier-existing timer expires, Device A
changes to a querier.
In this document version, querier election can be implemented only among multicast devices that run the
same MLD version on a network segment.
Background
When a multicast device is directly connected to user hosts, the multicast device sends MLD
Query messages to and receives MLD Report and Done messages from the user hosts to
identify the multicast groups that have attached receivers on the shared network segment.
The device directly connected to a multicast device, however, may not be a host but an MLD
proxy-capable access device to which hosts are connected. If you configure only MLD on the
multicast device, access device, and hosts, the multicast and access devices need to exchange
a large number of packets.
To resolve this problem, enable MLD on-demand on the multicast device. The multicast
device sends only one general query message to the access device. After receiving the general
query message, the access device sends the collected Join and Leave status of multicast
groups to the multicast device. The multicast device uses the Join and Leave status of the
multicast groups to maintain multicast group memberships on the local network segment.
Benefits
MLD on-demand reduces packet exchanges between a multicast device and its connected
access device and reduces the loads of these devices.
Related Concepts
MLD on-demand
MLD on-demand enables a multicast device to send only one MLD general query message to
its connected access device (MLD proxy-capable) and to use Join/Leave status of multicast
groups reported by its connected access device to maintain MLD group memberships.
Implementation
When a multicast device is directly connected to user hosts, the multicast device sends MLD
Query messages to and receives MLD Report and Done messages from the user hosts to
identify the multicast groups that have attached receivers on the shared network segment. The
device directly connected to the multicast device, however, may be not a host but an MLD
proxy-capable access device, as shown in Figure 1-1021.
The provider edge (PE) is a multicast device. The customer edge (CE) is an access device.
On the network a shown in Figure 1-1021, if MLD on-demand is not enabled on the PE,
the PE sends a large number of MLD Query messages to the CE, and the CE sends a
large number of Report and Done messages to the PE. As a result, lots of PE and CE
resources are consumed.
On the network b shown in Figure 1-1021, after MLD on-demand is enabled on the PE,
the PE sends only one general query message to the CE. After receiving the general
query message from the PE, the CE sends the collected Join and Leave status of MLD
groups to the PE. The CE sends a Report or Done message for a group to the PE only
when the Join or Leave status of the group changes. To be specific, the CE sends an
MLD Report message for a multicast group to the PE only when the first user joins the
multicast group and sends a Done message only when the last user leaves the multicast
group.
After you enable MLD on-demand on a multicast device connected to an MLD proxy-capable access
device, the multicast device implements MLD in a different way as it implements standard MLD in the
following aspects:
The records on dynamically joined multicast groups on the multicast device interface connected to
the access device do not time out.
The multicast device interface connected to the access device sends only one MLD general query
message to the access device.
The multicast device interface connected to the access device directly deletes the entry for a group
after it receives an MLD Done message for the group.
Definition
User-side multicast enables a BRAS to identify users of a multicast program.
In Figure 1-1023, when the set top box (STB) and phone users go online, they send Internet
Group Management Protocol (IGMP) Report messages of a multicast program to the BRAS.
After receiving the messages, the BRAS identifies the users and sends a Protocol Independent
Multicast (PIM) Join message to the network-side rendezvous point (RP) or the source's
designated router (DR). The RP or source's DR creates multicast forwarding entries for the
users and receives the required multicast traffic from the source. The BRAS finally sends the
multicast traffic to the STB and phone users based on their forwarding entries and replication
modes. The multicast replication in this example is based on sessions.
Now user-side multicast supports IPv4 and IPv6. For IPv4 users, user-side multicast applies to both
private and public networks. For IPv6 users, user-side multicast applies only to public networks.
On Layer 2, user-side multicast supports the PPPoE and IPoE access modes for common users and the
IPoE access mode for E-Line users.
Purpose
Because conventional multicast does not provide a method to identify users, carriers cannot
effectively manage multicast users who access services such as Internet Protocol television
(IPTV). Such users can join multicast groups, without notification, by sending Internet Group
Management Protocol (IGMP) Report messages. To identify these users and allow for
improved management of them, Huawei provides the user-side multicast feature.
Benefits
User-side multicast can identify users and the programs they join or leave for carriers to better
manage and control online users.
1.11.10.2 Principles
1.11.10.2.1 Overview
Table 1-319 describes multicast service processes.
Related Concepts
Multicast program
A multicast program is an IPTV channel or program and is identified by a multicast source
address and a multicast group.
Access mode
In user-side multicast, only the Point-to-Point Protocol over Ethernet (PPPoE) access and IP
over Ethernet (IPoE) access modes are supported, and only session-based replication is
supported.
PPPoE access mode: allows a remote access device to provide access services for hosts
on Ethernet networks and to implement user access control and accounting. PPPoE is a
link layer protocol that transmits PPP datagrams through PPP sessions established over
point-to-point connections on Ethernet networks.
IPoE access mode: allows the BRAS to perform authentication and authorization on
users and user services based on the physical or logical user information, such as the
MAC address, VLAN ID, and Option 82, carried in IPoE packets. In IPv4 network
access where a user terminal connects to an Ethernet interface of a BRAS through a
Layer 2 device, the user IP packets are encapsulated into IPoE packets by the user
Ethernet interface before they are transmitted to the BRAS through the Layer 2 device.
Table 1-320 Differences between PPPoE access users and IPoE access users in user-side multicast
If all of the preceding multicast replication modes are configured, the priority is as follows in descending
order: replication by interface + VLAN, session-based replication, replication by multicast VLAN, and
replication by interface.
In addition to multicast data packets replication, IGMP Query messages are sent based on the
preceding multicast replication modes.
Session-based multicast replication is used in the following illustration of the multicast program join
process. Multicast program join processes of other multicast replication modes are similar to that of
session-based multicast replication.
Accessing the Internet through PPPoE or IPoE is a prerequisite for users to join multicast
programs. Figure 1-1025 illustrates the procedures of multicast program join, and Table 1-322
describes each procedure
Session-based multicast replication is used in the following illustration of the multicast program leave
process. Multicast program leave processes of other multicast replication modes are similar to that of
session-based multicast replication.
BRA The BRAS stops sending to the STB the multicast traffic of the
S corresponding multicast group it joined.
BRA If there is no member in the multicast group after the STB user
S leaves, the BRAS sends a PIM Graft message to the RP or
source's DR to stop the multicast traffic replication to the group.
Session-based multicast replication is used in the following illustration of multicast program leave of all
multicast groups by going offline. Multicast program leave of all multicast groups by going offline
processes of other multicast replication modes are similar to that of session-based multicast replication.
Figure 1-1027 Process of multicast program leave of all multicast groups by going offline
Table 1-324 Key actions in each step of multicast program leave of all multicast groups by going
offline
Overview
User-side call admission control (CAC) is a bandwidth management and control method used
to guarantee multicast service quality of online users.
A conventional quality-guarantee mechanism is to limit the maximum number of multicast
groups that users can join. With this mechanism, a BRAS checks whether the maximum
number of multicast groups has been exceeded after receiving a Join message from a user. If
the maximum number has been exceeded, the device drops the Join message and denies the
user request. This mechanism alone, however, has become incompetent due to the continuous
increase of IPTV program varieties. A high upper limit may prevent the device from denying
many join requests but cannot prevent the device from dropping messages due to limited
bandwidth resources on interfaces.
User-side multicast CAC addresses these issues by enabling a BRAS to limit bandwidth for
users.
User-side multicast CAC enables a BRAS to check the bandwidth limit and deny user
requests if the limit has been exceeded.
User-side multicast CAC can be implemented for users in a specific domain and on a specific
interface. It works with the multicast group limit function to implement the following
functions:
User-level bandwidth limit: A bandwidth limit can be set for each user in a specific user
access domain, and new service requests of a user are denied when the bandwidth
consumed by the user exceeds the bandwidth limit.
Interface-level bandwidth limit: A bandwidth limit can be set for a user access interface,
and new service requests are denied when the consumed bandwidth exceeds the
bandwidth limit.
Principles
Figure 1-1028 shows the working principles of user-side multicast CAC in a process of going
online.
The STB and phone users go online.
The STB and phone users send IGMP Report messages to request for multicast services.
The BRAS receives the IGMP Report messages and checks the bandwidth limits
configured for the user access domain and interface.
− If the remaining bandwidth resources are sufficient for the users:
The BRAS sends a PIM Join message to the RP or source's DR. The RP or source's
DR creates a multicast forwarding entry, and sends the service flow received from
the source to the BRAS. The BRAS forwards the flow to the users based on the
multicast forwarding entry and multicast traffic replication mode (this examples
uses the by session mode).
− If the remaining bandwidth resources are insufficient for the users, the BRAS
discards the IGMP Report message and denies the service requests.
User-side multicast CAC supports only PPPoE and IPoE access modes for Layer 2 common users.
The BRAS supports four multicast traffic replication modes: by session, by interface + VLAN, by
multicast VLAN, and by interface.
Purpose
Limiting the maximum number of multicast groups cannot guarantee service quality any more
due to the increase of IPTV service varieties and the big bandwidth requirement difference
among multicast channels. Therefore, user-side multicast CAC was introduced to prevent
bandwidth resources from being exhausted, thus guaranteeing the IPTV service quality of
online users.
Benefits
User-side multicast CAC brings the following benefits:
Guarantees IPTV service quality for online users.
Allows the denial of new users when a mass of multicast channels are requested and
bandwidth resources are insufficient.
1.11.10.3 Applications
1.11.10.3.1 User-side Multicast for PPPoE Access Users
Service Description
Because conventional multicast does not provide a method to identify users, carriers cannot
effectively manage multicast users who access services such as Internet Protocol television
(IPTV). Such users can join multicast groups, without notification, by sending Internet Group
Management Protocol (IGMP) Report messages.
To identify these users and allow for improved management of them, Huawei provides the
user-side multicast feature.
Networking Description
In Figure 1-1029, a set top box (STB) user initiates a dial up connection through
Point-to-Point Protocol over Ethernet (PPPoE) to the broadband remote access server (BRAS).
The BRAS then assigns an IPv4 address to the user for Internet access. To join a multicast
program, the user sends an IGMP Report message to the BRAS, and the BRAS creates a
multicast forwarding entry for the user. In this entry, the downstream interface is the interface
that connects to the user. After the entry is created, the BRAS sends a Protocol Independent
Multicast (PIM) Join message to the network-side rendezvous point (RP) or the source's
designated router (DR). Upon receipt of this message, the RP or source's DR sends to the
BRAS the multicast traffic of the program that the user wants to join. The BRAS then
replicates and sends the multicast traffic to the user based on the multicast forwarding entry.
Feature Deployment
Deployment for the user-side multicast feature is as follows:
Configure an IPv4 address pool on the BRAS to assign IPv4 addresses to online users.
Configure Authentication, Authorization and Accounting (AAA) schemes.
Configure a domain for user management, such as AAA.
Configure the PPPoE access mode on the BRAS.
a. Configure a virtual template (VT) interface.
b. Bind a VT to an interface.
c. Bind the sub-interface to the virtual local area network (VLAN) if users are
connected to the sub-interface. (For users connected to the main interface, skip this
step.)
d. Configure a broadband access server (BAS) interface and specify a user access type
for the interface. (The BAS interface can be a main interface, a common
sub-interface, or a QinQ sub-interface.)
Configure basic multicast functions on the BRAS and on the RP or source's DR.
a. Enable multicast routing.
b. Enable Protocol Independent Multicast-Sparse Mode (PIM-SM) on BRAS
interfaces and on the RP or source's DR interfaces.
c. Enable IGMP on the BRAS interface connected to users.
Service Description
Because conventional multicast does not provide a method to identify users, carriers cannot
effectively manage multicast users who access services such as Internet Protocol television
(IPTV). Such users can join multicast groups, without notification, by sending Internet Group
Management Protocol (IGMP) Report messages.
To identify these users and allow for improved management of them, Huawei provides the
user-side multicast feature.
Networking Description
In Figure 1-1030, a set top box (STB) user connects to the BRAS through IPoE. (Using IPoE,
the user does not need to initiate a dial up connection, and so no client software is required.)
The BRAS then assigns an IPv4 address to the user for Internet access. To join a multicast
program, the user sends an IGMP Report message to the BRAS. The BRAS then creates a
multicast forwarding entry and establishes an outbound interface for the user. After the entry
is created, the BRAS sends a PIM Join message to the network-side RP or the source's DR.
Upon receipt of this message, the RP or source's DR sends to the BRAS the multicast data of
the program that the user wants to join. The BRAS then replicates and sends the multicast
data to the user based on the multicast forwarding entry.
Feature Deployment
Deployment for the user-side multicast feature is as follows:
Configure an IPv4 address pool on the BRAS to assign IPv4 addresses to online users.
Service Overview
User-side multicast VPN enables a BRAS to identify users of a multicast program, which
allows for improved management of them.
Networking Description
As shown in Figure 1-1031, the STB user and the multicast source belong to the same VPN
instance, which is a prerequisite for users to join programs of the multicast source on the VPN
that they belong to. To join a multicast program after accessing the Layer 3 VPN, the STB
user sends and IGMP Report message to the BRAS. Upon receipt of the IGMP Report
message, the BRAS identifies the domain and private VPN instance of the STB user. Then the
BRAS creates the multicast entry for the STB user in the corresponding VPN instance and
sends the PIM Join message to the network-side multicast source or RP for the multicast
traffic. As the final step, the BRAS replicates the multicast traffic to the STB user based on
different multicast replication modes.
Feature Deployment
Deployment for the user-side multicast VPN is as follows:
Configure an IPv4 address pool on the BRAS to assign IPv4 addresses to online users.
Configure Authentication, Authorization and Accounting (AAA) schemes.
Configure a domain for user management, such as AAA.
Configure the PPPoE or IPoE access mode on the BRAS.
Configure basic multicast VPN functions.
Configure a multicast replication mode on a BAS interface. By default, multicast
replication by interface is configured. You can choose to configure one of the following
multicast replication modes:
− Session-based multicast replication
− Multicast replication by interface + VLAN
− Multicast replication by VLAN
Bind a VPN instance of the specified multicast service to the main interface on a BRAS.
Enable IGMP and PIM on the main interface of the BRAS.
Definition
Layer 2 multicast implements on-demand multicast data transmission on the data link layer.
Figure 1-1032 shows a typical Layer 2 multicast application where Device B functions as a
Layer 2 device. After Layer 2 multicast is deployed on Device B, it listens to Internet Group
Management Protocol (IGMP) packets exchanged between Device A (a Layer 3 device) and
hosts and creates a Layer 2 multicast forwarding table. Then, Device B forwards multicast
data only to users who have explicitly requested the data, instead of broadcasting the data.
Purpose
Layer 2 multicast is designed to reduce network bandwidth consumption. For example,
without Layer 2 multicast, Device B cannot know which interfaces are connected to multicast
receivers. Therefore, after receiving a multicast packet from Device A, Device B broadcasts
the packet in the packet's broadcast domain. As a result, all hosts in the broadcast domain
(including those who do not request the packet) will receive the packet, which wastes network
bandwidth and compromises network security.
With Layer 2 multicast, DeviceB can create a Layer 2 multicast forwarding table and record
the mapping between multicast group addresses and interfaces in the table. After receiving a
multicast packet, Device B searches the forwarding table for downstream interfaces that map
to the packet's group address, and forwards the packet only to these interfaces. A multicast
group address can be a multicast IP address or a mapped multicast MAC address.
Functions
Major Layer 2 multicast functions include:
IGMP snooping
Static Layer 2 multicast
Layer 2 SSM mapping
IGMP snooping proxy
Multicast VLAN
Layer 2 multicast entry limit
Layer 2 Multicast Instance
Multicast Listener Discovery Snooping
Benefits
Layer 2 multicast offers the following benefits:
Reduced network bandwidth consumption
Lower performance requirements on Layer 3 devices
Improved multicast data security
Improved user service quality
1.11.11.2 Principles
1.11.11.2.1 IGMP Snooping
Background
Layer 3 devices and hosts use IGMP to implement multicast data communication. IGMP
messages are encapsulated in IP packets. A Layer 2 device can neither process Layer 3
information nor learn multicast MAC addresses in link layer data frames because source
MAC addresses in data frames are not multicast MAC addresses. As a result, when a Layer 2
device receives a data frame in which the destination MAC address is a multicast MAC
address, the device cannot find a matching entry in its MAC address table. The Layer 2 device
then broadcasts the multicast packet, which wastes bandwidth resources and compromises
network security.
IGMP snooping addresses this problem by controlling multicast traffic forwarding at Layer 2.
IGMP snooping enables a Layer 2 device to listen to and analyze IGMP messages exchanged
between a Layer 3 device and hosts. Based on the learned IGMP message information, the
device creates a Layer 2 forwarding table and uses it to implement on-demand packet
forwarding.
Figure 1-1033 shows a network on which Device B is a Layer 2 device and users connected to
Port 1 and Port 2 require multicast data from a multicast group (for example, 225.0.0.1).
If Device B does not run IGMP snooping, Device B broadcasts all received multicast
data at the data link layer.
If Device B runs IGMP snooping and receives data for a multicast group, Device B
searches the Layer 2 multicast forwarding table for ports connected to the users who
require the data. In this example, Device B sends the data only to Port 1 and Port 2
because the user connected to Port 3 does not require the data.
Figure 1-1033 Multicast packet transmission before and after IGMP snooping is configured on a
Layer 2 device
225.0.0.1 Port 1
225.0.0.1 Port 2
Related Concepts
Figure 1-1034 illustrates IGMP snooping on a Layer 2 multicast network.
A router port (labeled with a blue circle in Figure 1-1034): It connects a Layer 2
multicast device to an upstream multicast router.
Router ports can be dynamically discovered by IGMP or manually configured.
A member port of a multicast group (labeled with a yellow square in Figure 1-1034): It
connects a Layer 2 multicast device to group member hosts and is used by a Layer 2
multicast device to send multicast packets to hosts.
Member ports can be dynamically discovered by IGMP or manually configured.
A Layer 2 multicast forwarding entry: It is stored in the multicast forwarding table and
used by a Layer 2 multicast device to determine the forwarding of a multicast packet sent
from an upstream device. Information in a Layer 2 multicast forwarding entry includes:
− VLAN ID or VSI name
Figure 1-1035 Mapping between an IP multicast address and a multicast MAC address
Implementation
IGMP snooping is implemented as follows:
1. After IGMP snooping is deployed on a Layer 2 device, the device uses IGMP snooping
to analyze IGMP messages exchanged between hosts and a Layer 3 device and then
creates a Layer 2 multicast forwarding table based on the analysis. Information in
forwarding entries includes VLAN IDs or VSI names, multicast source addresses,
multicast group addresses, and numbers of ports connected to hosts.
− After receiving an IGMP Query message from an upstream device, the Layer 2
device sets a network-side port as a dynamic router port.
− After receiving a PIM Hello message from an upstream device, the Layer 2 device
sets a network-side port as a dynamic router port.
− After receiving an IGMP Report message from a downstream device or user, the
Layer 2 device sets a user-side port as a dynamic member port.
2. The IGMP snooping-capable Layer 2 device forwards a received packet based on the
Layer 2 multicast forwarding table.
Other Functions
IGMP snooping supports all IGMP versions.
IGMP has three versions: IGMPv1, IGMPv2, and IGMPv3. You can specify an IGMP
version for your device.
IGMP snooping enables a Layer 2 device to rapidly respond to Layer 2 network topology
changes.
Multiple Spanning Tree Protocol (MSTP) is usually used to connect Layer 2 devices to
implement rapid convergence. IGMP snooping adapts to this feature by enabling a Layer
2 device to immediately update port information and switch multicast data traffic over a
new forwarding path when the network topology changes, which minimizes multicast
service interruptions.
IGMP snooping allows you to configure a security policy for multicast groups.
This function can be used to limit the range and number of multicast groups that users
can join and to determine whether to receive multicast data packets containing a security
field. It provides refined control over multicast groups and improves network security.
Deployment Scenarios
IGMP snooping can be used on VLANs and virtual private LAN service (VPLS) networks.
Benefits
IGMP snooping deployed on a user-side router offers the following benefits:
Reduced bandwidth consumption
Independent accounting for individual hosts
Background
Multicast data can be transmitted to user terminals over an IP bearer network in either
dynamic or static multicast mode.
In dynamic multicast mode, a device starts to receive and deliver a multicast group's data
after receiving the first Report message for the group. The device stops receiving the
multicast group's data after receiving the last Leave message. The dynamic multicast
mode has both an advantage and a disadvantage:
− Advantage: It reduces bandwidth consumption by restricting multicast traffic.
− Disadvantage: It introduces a delay when a user switches a channel.
In static multicast mode, multicast forwarding entries are configured for each multicast
group on a device. A multicast group's data is delivered to a device, regardless of
whether users are requesting the data from this device. The static multicast mode has the
following advantages and disadvantages:
− Advantages:
Multicast routes are fixed, and multicast paths exist regardless of whether there
are multicast data receivers. Users can change channels without delays,
improving user experience.
Multicast source and group ranges are easy to manage because multicast paths
are stable.
The delay when data is first forwarded is minimal because static routes already
exist and do not need to be established the way dynamic multicast routes do.
− Disadvantages:
Each device on a multicast data transmission path must be manually
configured. The configuration workload is heavy.
Sub-optimal multicast forwarding paths may be generated because
downstream ports are manually specified on each device.
When a network topology or unicast routes change, static multicast paths may
need to be reconfigured. The configuration workload is heavy.
Multicast routes exist even when no multicast data needs to be forwarded. This
wastes network resources and creates high bandwidth requirements.
A Layer 2 multicast forwarding table can be dynamically built using IGMP snooping or be
manually configured. Choose the dynamic or static mode based on network quality
requirements and demanded service types.
If network bandwidth is sufficient and hosts require multicast data for specific multicast
groups from a router port for a long period of time, choose static Layer 2 multicast to
implement stable multicast data transmission on a metropolitan area network (MAN) or
bearer network. After static Layer 2 multicast is deployed on a device, multicast entries on the
device do not age and users attached to the device can stably receive multicast data for
specific multicast groups.
Related Concepts
Static router ports or member ports are used in static Layer 2 multicast.
Static router ports are used to receive multicast traffic.
Static member ports are used to send data for specific multicast groups.
Benefits
Static Layer 2 multicast offers the following benefits:
Simplified network management
Reduced network delays
Improved information security by preventing unregistered users from receiving multicast
packets
Background
IGMPv3 supports source-specific multicast (SSM), but IGMPv1 and IGMPv2 do not. The
majority of the latest multicast devices support IGMPv3, but most legacy multicast terminals
only support IGMPv1 or IGMPv2. SSM mapping is a transition solution that provides SSM
services for such legacy multicast terminals. Using rules that specify the mapping from a
particular multicast group to a source-specific group, SSM mapping can convert IGMPv1 or
IGMPv2 messages whose group addresses are within the SSM range to IGMPv3 messages.
This mechanism allows hosts running IGMPv1 or IGMPv2 to access SSM services. SSM
mapping allows IGMPv1 or IGMPv2 terminals to access only specific sources, thus
minimizing the risks of attacks on multicast sources.
Layer 2 SSM mapping is used to implement SSM mapping on Layer 2 networks. For example,
on the network shown in Figure 1-1036, the Layer 3 device runs IGMPv3 and directly
connects to a Layer 2 device. Host A runs IGMPv3, Host B runs IGMPv2, and Host C runs
IGMPv1 on the Layer 2 network. If the IGMP versions of Host B and Host C cannot be
upgraded to IGMPv3, Layer 2 SSM mapping needs to be configured on the Layer 2 device to
provide SSM services for all hosts on the network segment.
Implementation
If SSM mapping is configured on a multicast device and mapping between group addresses
and source addresses is configured, the multicast device will perform the following actions
after receiving a (*, G) message from a host running IGMPv1 or IGMPv2:
If the message's multicast group address is not in the SSM group address range, the
device processes the message in the same manner as it processes an IGMPv1 or IGMPv2
message.
If the message's multicast group address is in the SSM group address range, the device
maps the (*, G) message into (S, G) messages based on mapping rules.
Benefits
Layer 2 SSM mapping offers the follow benefits:
Enables IGMPv1/v2 terminal users to enjoy SSM services.
Better protects multicast sources against attacks.
Background
Forwarding entries are generated when a Layer 3 device (PE on the network shown in Figure
1-1037) exchanges IGMP messages with user hosts. If there are many user hosts, excessive
IGMP messages will reduce the forwarding capability of the Layer 3 device.
To resolve this issue, deploy IGMP snooping proxy on a Layer 2 device (CE on the network
shown in Figure 1-1037) that connects the Layer 3 device and hosts. IGMP snooping proxy
enables a Layer 2 device to behave as both a Layer 3 device and a user host, so that the Layer
2 device can terminate IGMP messages to be transmitted between the Layer 3 device and user
host. IGMP snooping proxy enables a Layer 2 device to perform the following operations:
Periodically send Query messages to hosts and receive Report and Leave messages from
hosts.
Maintain group member relationships.
Send Report and Leave messages to a Layer 3 device.
Forward multicast traffic only to hosts who require it.
After IGMP snooping proxy is deployed on a Layer 2 device, the Layer 2 device is not a
transparent message forwarder between a Layer 3 device and user host any more. Furthermore,
the Layer 3 device only recognizes the Layer 2 device and is unaware of user hosts.
Implementation
A device that runs IGMP snooping proxy establishes and maintains a multicast forwarding
table and sends multicast data to users based on this table. IGMP snooping proxy implements
the following functions:
IGMP snooping proxy implements the querier function for upstream devices, enabling a
Layer 2 device to send Query messages on behalf of its interworking upstream device.
The querier function must be enabled by deploying directly or enabling IGMP snooping
proxy on a Layer 2 device if its interworking upstream device cannot send IGMP Query
messages or if static multicast groups are configured on the upstream device.
IGMP snooping proxy enables a Layer 2 device to suppress Report and Leave messages
if large numbers of users frequently join or leave multicast groups. This function reduces
message processing workload for upstream devices.
− After receiving the first Report message for a multicast group from a user host, the
device checks whether an entry has been created for this group. If an entry has not
been created, the device sends the Report message to its upstream device and
creates an entry for this group. If an entry has been created, the device adds the host
to the multicast group and does not send a Report messages to its upstream device.
− After receiving a Leave message for a group from a user host, the device sends a
group-specific query message to check whether there are any members of this group.
If there are members of this group, the device deletes only the user from the group.
If there are no other members of this group, the device considers the user as the last
member of the group and sends a Leave message to its upstream device.
Deployment Scenarios
IGMP snooping proxy can be used on VLANs and VPLS networks.
Benefits
IGMP snooping proxy deployed on a user-side Layer 2 router offers the following benefits:
Reduced bandwidth consumption
Reduced workload on Layer 3 devices directly connected to the Layer 2 router
Background
In traditional multicast on-demand mode, if users in different VLANs require the same
multicast flow, an upstream device of a Layer 2 device must send a copy of the multicast flow
for each user. This mode wastes bandwidth and imposes additional burdens on the upstream
device.
The multicast VLAN function can be used to address this problem. With the help of IGMP
snooping, the multicast VLAN function moves the multicast replication point downstream to
an edge device, so that only one multicast flow is replicated on an upstream device for
different VLANs that require the same flow.
For example, on the network shown in Figure 1-1038, the CE is a Layer 2 device, and the PE
is an upstream device of the CE. Users in VLANs 11 and 12 require the same multicast flow
from the CE. After multicast VLAN 3 is configured on the CE, the PE sends only one copy of
the multicast flow to VLAN 3. The CE then sends a copy of the multicast flow to VLAN 11
and VLAN 22. The PE no longer needs to send identical multicast data flows downstream.
This mode saves network bandwidth and relieves the load on the PE.
Figure 1-1038 Multicast flow replication before and after multicast VLAN is configured
The following uses the network shown in Figure 1-1038 as an example to describe why
multicast VLAN requires IGMP snooping proxy to be enabled.
If IGMP snooping proxy is not enabled on VLAN 3 and users in different VLANs want
to join the same group, the CE forwards each user's IGMP Report message to the PE.
Similarly, if users in different VLANs leave the same group, the CE also needs to
forward each user's IGMP Leave message to the PE.
If IGMP snooping proxy is enabled on VLAN 3 and users in different VLANs want to
join the same group, the CE forwards only one IGMP Report message to the PE. If the
last member of the group leaves, the CE sends an IGMP Leave message to the PE. This
reduces network-side bandwidth consumption on the CE and performance pressure on
the PE.
Related Concepts
The following concepts are involved in the multicast VLAN function:
Multicast VLAN: is a VLAN to which the interface connected to a multicast source
belongs. A multicast VLAN is used to aggregate multicast flows.
User VLAN: is a VLAN to which a group member host belongs. A user VLAN is used to
receive multicast flows from a multicast VLAN.
Implementation
The multicast VLAN implementation process can be divided into two parts:
Protocol packet forwarding
− After the user VLAN tag in an IGMP Report message is replaced with a
corresponding multicast VLAN tag, the message is sent out through a router port of
the multicast VLAN.
− After the multicast VLAN tag in an IGMP Query message is replaced with a
corresponding user VLAN tag, the message is sent out through a member port of
the user VLAN.
− Entries learned through IGMP snooping in user VLANs are added to the table of the
multicast VLAN.
Multicast data forwarding
After receiving a multicast data packet from an upstream device, a Layer 2 device
searches its multicast forwarding table for a matching entry.
− If a matching forwarding entry exists, the Layer 2 device will identify the
downstream ports and their VLAN IDs, replicate the multicast data packet on each
downstream port, and send a copy of the packet to user VLANs.
− If no matching forwarding entry exists, the Layer 2 device will discard the multicast
data packet.
Other Functions
A user VLAN allows you to configure the querier election function. The following uses the
network shown in Figure 1-1039 as an example to describe the querier election function.
On the network shown in Figure 1-1039:
A CE connects to Router A through both Router B and Router C, which improves the
reliability of data transmission. The querier function is enabled on Router B and Router
C.
Multicast VLAN is enabled on Router B and Router C. VLAN 11 is a multicast VLAN,
and VLAN 22 is a user VLAN.
Both Router B and Router C in VLAN 11 are connected to VLAN 22. As a result, VLAN 22
will receive two identical copies for the same requested multicast flow from Router B and
Router C, causing data redundancy.
To address this problem, configure querier election on Router B and Router C in the user
VLAN and specify one of them to send Query messages and forward multicast data flows. In
this manner, VLAN 22 receives only one copy of a multicast data flow from the upstream
Router A over VLAN 11.
A querier is elected as follows in a user VLAN (the network shown in Figure 1-1039 is used
as an example):
1. After receiving a Query message from Router A, Router B and Router C replace the
source IP address of the Query message with their own local source IP address (1.1.1.1
for Router B and 1.1.1.2 for Router C).
2. Router B and Router C exchange Query messages. Based on the querier election
algorithm, Router B with a smaller source IP address is elected as a querier.
3. As a querier, Router B generates a forwarding entry after receiving a Join message from
VLAN 22, while Router C does not generate a forwarding entry. Then, multicast data
flows from upstream devices are forwarded by Router B to VLAN 22.
Deployment Scenarios
The multicast VLAN function can be used on VLANs.
Benefits
The multicast VLAN function offers the following benefits:
Reduced bandwidth consumption
Reduced workloads for Layer 3 devices
Simplified management of multicast sources and multicast group members
Principles
With the growing popularity of IPTV applications, multicast services are more widely
deployed than ever. When multicast services are deployed on a Layer 2 network, a number of
problems may arise:
If users join a large number of multicast groups, sparsely distributed multicast groups
will increase performance pressure on network devices.
If network bandwidth is insufficient, the demand for bandwidth resources will exceed the
total network bandwidth, overloading aggregation layer devices and degrading user
experience.
If multicast packets are used to attack a network, network devices become busy
processing attack packets and cannot respond to normal network requests.
On the network shown in Figure 1-1040, Layer 2 multicast entry limit can be deployed on the
UPE and NPEs to address the problems described above. The Layer 2 multicast entry limit
function limits entries of multicast services on a Layer 2 network. This function implements
multicast service access restrictions and refined control on the aggregation network based on
the number of multicast groups. Layer 2 multicast entry limit also enables service providers to
refine content offerings and develop flexible subscriber-specific policies. This prevents the
demand for bandwidth resources from exceeding the total bandwidth of the aggregation
network and improves service quality for users.
Related Concepts
Entry limit: provides rules to limit the number of multicast groups, implementing control over
multicast entry learning.
Implementation
If IGMP snooping is enabled, Layer 2 multicast entry limit can be used to control multicast
services. Multicast entry limit constrains the generation of multicast forwarding entries. When
a specified threshold is reached, no more forwarding entries will be generated. This conserves
the processing capacity of devices and controls link bandwidth.
Layer 2 multicast entry limit can be classified by usage scenario as follows:
VLAN scenario:
− Layer 2 multicast entry limit in a VLAN
− Layer 2 multicast entry limit on an interface
− Layer 2 multicast entry limit in a VLAN on a specified interface
VPLS scenario:
− Layer 2 multicast entry limit in a VSI
− Layer 2 multicast entry limit on a sub-interface
− Layer 2 multicast entry limit on a PW
Layer 2 multicast entry limit can restrict the following items:
Number of multicast groups
The number of multicast groups allowed can be limited when a device creates Layer 2
multicast forwarding entries. This protects device and network performance by limiting
the number of groups available for users to join. After IGMP Report messages are
received from downstream user hosts, the device checks entry limit statistics to
determine whether the threshold for the number of multicast groups has been reached. If
the threshold has not been reached, a forwarding entry is generated and entry limit
statistics are updated to show the increase in groups. If the threshold has been reached,
no entry is generated. When IGMP Leave messages are received or entries age, the
entries are deleted and entry limit statistics are updated.
Deployment Scenarios
Layer 2 multicast entry limit can be used on VLANs and VPLS networks.
Benefits
Layer 2 multicast entry limit offers the following benefits:
Prevents required bandwidth resources from exceeding the total bandwidth of the
aggregation network and improves service quality for users.
Improves multicast service security.
Background
With the growing popularity of IPTV applications, multicast services are more widely
deployed than ever. When multicast services are deployed on a Layer 2 network, a number of
problems may arise:
If users join a large number of multicast groups, sparsely distributed multicast groups
will increase performance pressure on network devices.
If network bandwidth is insufficient, the demand for bandwidth resources will exceed the
total bandwidth of the network, overloading aggregation layer devices and degrading
user experience.
If multicast packets are used to attack a network, network devices become busy
processing attack packets and cannot respond to normal network requests.
If static multicast group management policies are used, user requests for access to a
variety of different multicast services cannot be met. Service providers expect more
refined channel management. For example, they expect to limit the number and
bandwidth of multicast groups in channels.
On the network shown in Figure 1-1041, Layer 2 multicast CAC can be deployed on the UPE
and NPEs to address the problems described above. Layer 2 multicast CAC controls multicast
services on the aggregation network based on different criteria, including the multicast group
quantity and bandwidth limit for a channel or sub-interface. Layer 2 multicast CAC enables
service providers to refine content offerings and develop flexible subscriber-specific policies.
This prevents the demand for bandwidth resources from exceeding the total bandwidth of the
aggregation network and ensures service quality for users.
Related Concepts
The following concepts are involved in multicast CAC.
Call Admission Control (CAC): provides a series of rules for controlling multicast entry
learning, including the multicast group quantity and bandwidth limits for each multicast
group, as well as for each channel. Layer 2 multicast CAC is used to perform CAC
operations for multicast services on Layer 2 networks.
Channel: consists of a series of multicast groups, each of which can have its own
bandwidth attribute. For example, a TV channel consists of two groups, TV-1 and TV-5,
with the bandwidth of 4 Mbit/s and 18 Mbit/s, respectively.
Implementation
Layer 2 multicast CAC constrains the generation of multicast forwarding entries. When a
preset threshold is reached, no more forwarding entries can be generated. This ensures that
devices have adequate processing capabilities and controls link bandwidth.
Layer 2 multicast CAC can restrict the following items:
Restriction on the number and bandwidth of multicast groups
The number of multicast groups allowed can be limited when a device creates Layer 2
multicast forwarding entries. This protects device and network performance by limiting
the number of groups available for users to join. After IGMP Report messages are
received from downstream user hosts, the device checks CAC statistics to determine
whether the threshold for the number of multicast groups has been reached. If the
threshold has not been reached, a forwarding entry is generated and CAC statistics are
updated to show the increase in groups. If the threshold has been reached, no entry is
generated. When IGMP Leave messages are received or entries age, the entries are
deleted and CAC statistics are updated.
If the bandwidth of each multicast group is fixed and each group uses approximately the
same amount of bandwidth, the total bandwidth for multicast traffic is basically fixed.
For example, if there are 20 multicast groups and each multicast group has 4 kbit/s of
bandwidth, the total bandwidth for multicast traffic is 80 kbit/s. If there are 20 multicast
groups and the bandwidth values of the multicast groups are different, some being 4
kbit/s and the others being 18 kbit/s, the total bandwidth for multicast traffic cannot be
determined. In a case like this, setting a limit on the number of multicast groups is not
adequate to control bandwidth. Bandwidth usage must be limited.
Restriction on the number and bandwidth of multicast groups in a channel
If a network offers channels for different content providers, the number of multicast
groups and the amount of bandwidth must be limited based on channels.
Before a Layer 2 multicast entry is generated, the multicast group address must be
checked to determine which channel's address range to which this address belongs.
Whether CAC is configured for the address range needs to be checked also. If CAC is
configured for the address range and the number or bandwidth of member multicast
groups exceeds the upper threshold, the Layer 2 entry will not be generated. The Layer 2
entry will be generated only if the number or bandwidth of member multicast groups is
below the upper threshold.
Deployment Scenarios
Layer 2 multicast CAC applies to VPLS networks.
Benefits
The Layer 2 multicast CAC feature provides the following benefits:
For providers:
− Provides channel-based restrictions, allowing service providers to implement
refined multicast service management.
− Improves multicast service security.
For users:
− Prevents bandwidth resources required from exceeding the total bandwidth of the
aggregation network and ensures service quality for users.
Principles
Multicast services have relatively high demands for real-time transmissions. To ensure
uninterrupted delivery of multicast services, master and backup links and devices are
deployed on a VPLS network with a UPE dual-homed to SPEs. In the networking shown in
Figure 1-1042, a UPE is connected to two SPEs through a VPLS network. The PWs between
the UPE and SPEs work in master/backup mode. Multicast services are delivered from a
multicast source to users attached to the UPE.
This networking allows unicast services to be transmitted properly, but there are problems
with the transmission of multicast services. Multicast protocol and data packets are blocked
on the backup PW and this prevents the backup SPE (SPE2) from learning multicast
forwarding entries. As a result, SPE2 has no forwarding entries, and, in the event of a
master/backup SPE switchover, it cannot begin forwarding multicast data traffic immediately.
The PE must first resend an IGMP Query message and users attached to the UPE must reply
with Report messages before SPE2 can learn multicast forwarding entries through the backup
PW and resume the forwarding of multicast data packets. As a result, services are interrupted
on the UPE for a long period of time, and network reliability is adversely affected.
If the primary and secondary PWs in this networking are hub PWs, split horizon still takes effect,
meaning that protocol and data packets are not transmitted from the primary PW to the secondary PW.
To address this problem, rapid multicast traffic forwarding is configured on the backup device,
SPE2. SPE2 sends an IGMP Query message to the UPE along the backup PW, and receives an
IGMP Report message from the UPE to create a Layer 2 multicast forwarding table. Although
the backup PW cannot be used to forward multicast data traffic, it can be used by SPE2 to
send an IGMP Query message. If there is a switchover and the backup PW becomes the
master, SPE2 has a Layer 2 multicast forwarding table ready to use and can begin forwarding
multicast data traffic immediately. This ensures uninterrupted delivery of multicast services.
Related Concepts
The following concepts are involved in rapid multicast data forwarding on a backup device:
Master and backup devices
Between the devices to which a device directly connected to user hosts are dual-homed
through a VPLS network, the working device is the master, and the device that protects
the working device is the backup.
Primary and backup links
The physical link between the device directly connected to user hosts and the master
device is the primary link. The physical link between the device directly connected to
user hosts and backup device is the backup link.
Primary and backup PWs
The PW between the device directly connected to user hosts and the master device is the
primary PW. The PW between the device directly connected to user hosts and the backup
device is the backup PW.
Other Functions
If the upstream and downstream devices (PE and UPE) are not allowed to receive IGMP
messages that carry the same source MAC address but are sent from different interfaces, the
backup device needs to be configured to replace the source MAC addresses carried in IGMP
messages.
After rapid multicast traffic forwarding is configured, the UPE receives IGMP Query
messages from both SPE1 and SPE2. Both messages carry the same MAC address. If
MAC-flapping or MAC address authentication has been configured on the UPE, protocol
packets that are received by the UPE through different interfaces but carry the same
source MAC address will be filtered out. The backup SPE can be configured to change
the source MAC addresses of packets to its MAC address before sending IGMP Query
messages along the backup PW. This allows the UPE to learn two different router ports
and send IGMP Report and Leave messages from attached users to SPE1 and SPE2.
Similarly, if MAC-flapping or MAC address authentication has been configured on the
PE, the backup SPE needs to be configured to change the source MAC addresses of
received IGMP Report or Leave messages to its MAC address before sending them to
the PE.
Deployment Scenarios
Rapid multicast data forwarding on a backup device is used on VPLS networks that have a
device dual-homed to upstream devices through PWs.
Benefits
Rapid multicast data forwarding on a backup device provides the following benefit:
After a master/backup device switchover is performed, multicast data can be quickly
forwarded on the backup device. This ensures reliable multicast service transmission and
enhances user experience.
Background
In conventional multicast on-demand mode, if users of a Layer 2 multicast device in different
VLANs or VSIs request for the same multicast group's data from the same source, the
connected upstream Layer 3 device has to send a copy of each multicast flow of this group for
each VLAN or VSI. Such implementation wastes bandwidth resources and burdens the
upstream device.
The Layer 2 multicast instance feature, which is an enhancement of multicast VLAN, resolves
these issues by allowing multicast data replication across VLANs and VSIs and supporting
multicast data transmission of the same multicast group across instances. These functions help
save bandwidth resources and simplify multicast group management. A Layer 2 network
supports multiple Layer 2 multicast instances. For example, on the network shown in Figure
1-1043, if users in VLAN 11 and VLAN 22 request for multicast data from channels in the
range of 225.0.0.1 to 225.0.0.5, Layer 2 multicast instances can be deployed on the CE. Then,
the CE requests for only a single copy of each multicast data flow through VLAN 3 from the
PE, replicates the multicast data flow, and sends a copy to each VLAN. This implementation
greatly reduces bandwidth consumption.
Layer 2 multicast instances allow devices to replicate multicast data flows across different
types of instances, such as flow replication from a VPLS to a VLAN or from a VLAN to a
VPLS.
Related Concepts
Multicast instance
An instance to which the interface connected to a multicast source belongs. A multicast
instance aggregates multicast flows.
User instance
An instance to which the interface connected to a multicast receiver belongs. A user
instance receives multicast flows from a multicast instance.
A multicast instance can be associated with multiple user instances.
Multicast channel
A multicast channel consists of one or more multicast groups. To facilitate service
management, multicast content providers generally operate different types of channels in
different Layer 2 multicast instances. Therefore, multicast channels need to be
configured for Layer 2 multicast instances.
Implementation
After receiving a multicast data packet from an upstream device, a Layer 2 device searches
for a matching entry in the multicast forwarding table based on the multicast instance ID and
the destination address (multicast group address) contained in the packet. If a matching
forwarding entry exists, the Layer 2 device obtains the downstream interfaces and the VLAN
IDs or VSI names, replicates the multicast data packet on each downstream interface, and
sends a copy of the packet to all involved user instances. If no matching forwarding exists, the
Layer 2 device broadcasts the multicast data packet in the local multicast VLAN or VSI. This
implementation is similar to multicast VLAN implementation.
Usage Scenario
Layer 2 multicast instances apply to VLAN and VPLS networks.
Benefits
Layer 2 multicast instances bring the following benefits:
Reduced bandwidth consumption
Improved network security
Isolated unicast and multicast domains to prevent user traffic from affecting each other
Definition
Multicast Listener Discovery Snooping (MLD snooping) is an IPv6 Layer 2 multicast
protocol. The MLD snooping protocol maintains information about the outbound interfaces of
multicast packets by snooping multicast protocol packets exchanged between the Layer 3
multicast device and user hosts. MLD snooping manages and controls multicast packet
forwarding at the data link layer.
Purpose
Similar to an IPv4 multicast network, multicast data on an IPv6 multicast network (especially
on an LAN) have to pass through Layer 2 switching devices. As shown in Figure 1-1044, a
Layer 2 switch locates between multicast users and the Layer 3 multicast device, Router.
After receiving multicast packets from Router, Switch forwards the multicast packets to the
multicast receivers. The destination address of the multicast packets is a multicast group
address. Switch cannot learn multicast MAC address entries, so it broadcasts the multicast
packets in the broadcast domain. All hosts in the broadcast domain will receive the multicast
packets, regardless of whether they are members of the multicast group. This wastes network
bandwidth and threatens network security.
MLD snooping solves this problem. MLD snooping is a Layer 2 multicast protocol on the
IPv6 network. After MLD snooping is configured, Switch can snoop and analyze MLD
messages between multicast users and Router. The Layer 2 multicast device sets up Layer 2
multicast forwarding entries to control forwarding of multicast data. In this way, multicast
data is not broadcast on the Layer 2 network.
Principles
MLD snooping is a basic IPv6 Layer 2 multicast function that forwards and controls multicast
traffic at Layer 2. MLD snooping runs on a Layer 2 device and analyzes MLD messages
exchanged between a Layer 3 device and hosts to set up and maintain a Layer 2 multicast
forwarding table. The Layer 2 device forwards multicast packets based on the Layer 2
multicast forwarding table.
On an IPv6 multicast network shown in Figure 1-1045, after receiving multicast packets from
Router, Switch at the edge of the access layer forwards the multicast packets to receiver hosts.
If Switch does not run MLD snooping, it broadcasts multicast packets at Layer 2. After MLD
snooping is configured, Switch forwards multicast packets only to specified hosts.
With MLD snooping configured, Switch listens on MLD messages exchanged between
Router and hosts. It analyzes packet information (such as packet type, group address, and
receiving interface) to set up and maintain a Layer 2 multicast forwarding table, and forwards
multicast packets based on the Layer 2 multicast forwarding table.
Figure 1-1045 Multicast packet transmission before and after MLD snooping is configured on a
Layer 2 device
Concepts
As shown in Figure 1-1046, Router connects to the multicast source. MLD snooping is
configured on SwitchA and SwitchB. HostA, HostB, and HostC are receiver hosts.
Figure 1-1046 shows MLD snooping ports. The following table describes these ports.
The router port and member port are outbound interfaces in Layer 2 multicast forwarding
entries. A router port functions as an upstream interface, while a member port functions as a
downstream interface. Port information learned through protocol packets is saved as dynamic
entries, and port information manually configured is saved as static entries.
Besides the outbound interfaces, each entry includes multicast group addresses and VLAN
IDs.
Multicast group addresses can be multicast IP addresses or multicast MAC addresses
mapped from multicast IP addresses. In MAC address-based forwarding mode, multicast
data may be forwarded to hosts that do not require the data because multiple IP addresses
are mapped to the same MAC address. The IP address-based forwarding mode can
prevent this problem.
The VLAN ID specifies a Layer 2 broadcast domain. After multicast VLAN is
configured, the inbound VLAN ID is the multicast VLAN ID, and the outbound VLAN
ID is a user VLAN ID. If multicast VLAN is not configured, both the inbound and
outbound VLAN IDs are the ID of the VLAN to which a host belongs.
Implementation
After MLD snooping is configured, the Layer 2 multicast device processes the received MLD
protocol packets in different ways and sets up Layer 2 multicast forwarding entries.
NOTE
Aging time of a dynamic
router port = Robustness
variable × General query
interval + Maximum response
time for General Query
messages
Multicast-Address-Specific A
Query/Multicast-Address-an Multicast-Address-Specific
d-Source-Specific Query Query/Multicast-Address-an
message d-Source-Specific Query
message is forwarded to the
ports connected to members
of specific groups.
Upon receiving an IPv6 PIM Hello message, a Layer 2 device forwards the message to all
ports excluding the port that receives the Hello message. The Layer 2 device processes the
receiving port as follows:
If the port is included in the router port list, the device resets the aging timer of the router
port.
If the port is not in the router port list, the device adds it to the list and starts the aging
timer.
When the Layer 2 device receives an IPv6 PIM Hello message, it sets the aging time of the router port to
the Holdtime value in the Hello message.
If a static router port is configured, the Layer 2 device forwards received MLD Report and
Done messages to the static router port. If a static member port is configured for a multicast
group, the Layer 2 device adds the port to the outbound interface list for the multicast group.
After a Layer 2 multicast forwarding table is set up, the Layer 2 device searches the multicast
forwarding table for outbound interfaces of multicast data packets according to the VLAN IDs
and destination addresses (IPv6 group addresses) of the packets. If outbound interfaces are
found for a packet, the Layer 2 device forwards the packet to all the member ports of the
multicast group. If no outbound interface is found, the Layer 2 device drops the packet or
broadcasts the packet in the VLAN.
With MLD snooping proxy configured, Switch can terminate MLD Query messages sent from
Router and MLD Report/Done sent from downstream hosts. When receiving these messages,
Switch constructs new messages to send them to Router.
After MLD snooping proxy is deployed on the Layer 2 device, the Layer 3 device considers
that it interacts with only one user. The Layer 2 device interacts with the upstream device and
downstream hosts. The MLD snooping proxy function conserves bandwidth by reducing
MLD message exchanges. In addition, MLD snooping proxy functions as a querier to process
protocol messages received from downstream hosts and maintain group memberships. This
reduces the load of the upstream Layer 3 device.
Implementation
A device that runs MLD snooping proxy sets up and maintains a Layer 2 multicast forwarding
table and sends multicast data to hosts based on the multicast forwarding table. Table 1-328
describes how the MLD snooping proxy device processes MLD messages.
1.11.11.3 Applications
1.11.11.3.1 Application of Layer 2 Multicast for IPTV Services
Service Overview
IPTV services are video services provided for users through an IP network. IPTV services
pose high requirements for bandwidth, real-time transmission, and reliability on IP MANs.
Multiple users can receive the same IPTV service data simultaneously.
Given the characteristics of IPTV, multicast technologies can be used to bear IPTV services.
Compared with traditional unicast, multicast ensures that network bandwidth demands do not
increase with the number of users and reduces the workload of video servers and the bearer
network. If service providers want to deploy IPTV services in a rapid and economical way,
E2E multicast push is recommended.
Network Description
Currently, the IP MAN consists of a metro backbone network and broadband access network.
IPTV service traffic is pushed to user terminals through the metro backbone network and
broadband access network in sequence. Figure 1-1048 shows an E2E IPTV service push
model. The metro backbone network is mainly composed of network layer (Layer 3) devices.
PIM such as PIM-SM is used on each device on the metro backbone to connect to the
multicast source and IGMP is used on the devices directly connected to the broadband access
network to forward multicast packets to user terminals. The broadband access network is
mainly composed of data link layer (Layer 2) devices. Layer 2 multicast techniques such as
IGMP proxy or IGMP snooping can be used on Layer 2 devices to forward multicast packets
to terminal users.
The following section describes Layer 2 multicast features used on the broadband access
network.
Feature Deployment
The broadband access network is constructed using Layer 2 devices. Layer 2 devices
exchange or forward data frames by MAC address. They have week IP packet parsing and
routing capabilities. As a result, the Layer 2 devices do not support Layer 3 multicast
protocols. Previously, Layer 2 devices broadcast IPTV multicast traffic to all interfaces, which
easily results in broadcast storms.
To solve the problem of multicast packet flooding, commonly used Layer 2 multicast
forwarding techniques, such as IGMP snooping, IGMP proxy, and multicast VLAN, can be
used.
Deploy IGMP snooping on all Layer 2 devices, so that they listen to IGMP messages
exchanged between Layer 3 devices and user terminals and maintain multicast group
memberships, implementing on-demand multicast traffic forwarding.
Deploy IGMP snooping proxy on CEs close to user terminals, so that the CEs listen to,
filter, and forward IGMP messages. This reduces the number of multicast protocol
packets directly exchanged between CEs and upstream devices, and reduces packet
processing pressure on upstream devices.
Deploy multicast VLAN= on CEs close to user terminals to reduce the network
bandwidth required for transmissions between CEs and multicast sources.
The following features can also be deployed on Layer 2 devices:
VSI or VLAN-based Layer 2 multicast instance (a multicast VLAN enhancement) can be
deployed on CEs close to user terminals to reduce the network bandwidth required for
transmissions between CEs and multicast sources.
If the number of user terminals attached to a CE exceeds the number of IPTV channels,
static multicast groups can be configured on the CE to increase the channel change speed
and improve the QoS for IPTV services.
If user hosts support IGMPv1 and IGMPv2 only, SSM mapping can be deployed on the
CE connected to these user terminals so the user hosts can access SSM services.
Rapid multicast traffic forwarding can be deployed on a backup PE to improve the
reliability of links between the PE and CE.
This example uses an IPTV channel with a bandwidth of 2 Mbit/s.
If a Layer 2 device uses no Layer 2 multicast forwarding technology, the device forwards
multicast packets to all IPTV users. Broadcasting multicast packets for five IPTV
channels leads to network congestion. This is the case even if the bandwidth of the
interface connecting the Layer 2 device to users is 10 Mbit/s.
After Layer 2 multicast forwarding technologies are used on the Layer 2 device, the
Layer 2 device sends multicast packets only to users that require the multicast packets. If
each interface of the Layer 2 device is connected to at least one IPTV user terminal,
multicast packets (2 Mbit/s traffic) for at most one BTV channel are forwarded to
corresponding interfaces. This ensures the availability of adequate network bandwidth
and the quality of user experience.
Networking Description
As shown in Figure 1-1049, a multicast source exists on an IPv6 PIM network and provides
multicast video services for users on the LAN. Some users such as HostA and HostC on the
LAN want to receive video data in multicast mode. To prevent multicast data from being
broadcast on the LAN, configure MLD snooping on Layer 2 multicast devices to accurately
forward multicast data on the Layer 2 network, which prevents bandwidth waste and network
information leakage.
Deployed Features
You can deploy the following features to accurately forward multicast data on the network
shown in Figure 1-1049:
IPv6 PIM and MLD on the Layer 3 multicast device Router to route multicast data to
user segments.
MLD snooping on the Layer 2 device Switch so that Switch can set up and maintain a
Layer 2 multicast forwarding table to forward multicast data to specified users.
MLD snooping proxy after configuring MLD snooping on Switch to release Router from
processing a large number of MLD messages.
Terms
Term Definition
(*, G) A multicast routing entry used in the ASM model. * indicates
Term Definition
any source, and G indicates a multicast group.
(*, G) applies to all multicast messages with the multicast
group address as G. That is, all the multicast messages sent to
G are forwarded through the downstream interface of the (*, G)
entry, regardless of which multicast sources send the multicast
messages.
(S, G) A multicast routing entry used in the SSM model. S indicates a
multicast source, and G indicates a multicast group.
After a multicast packet with S as the source address and G as
the group address reaches a router, it is forwarded through the
downstream interfaces of the (S, G) entry.
A multicast packet that contains a specified source address is
expressed as an (S, G) packet.
1.12 MPLS
1.12.1 About This Document
Purpose
This document describes the MPLS feature in terms of its overview, principles, and
applications.
Related Version
The following table lists the product version related to this document.
Intended Audience
This document is intended for:
Network planning engineers
Commissioning engineers
Data configuration engineers
System maintenance engineers
Security Declaration
Encryption algorithm declaration
The encryption algorithms DES/3DES/SKIPJACK/RC2/RSA (RSA-1024 or
lower)/MD2/MD4/MD5 (in digital signature scenarios and password encryption)/SHA1
(in digital signature scenarios) have a low security, which may bring security risks. If
protocols allowed, using more secure encryption algorithms, such as AES/RSA
(RSA-2048 or higher)/SHA2/HMAC-SHA2 is recommended.
Password configuration declaration
− Do not set both the start and end characters of a password to "%^%#". This causes
the password to be displayed directly in the configuration file.
− To further improve device security, periodically change the password.
Personal data declaration
Your purchased products, services, or features may use users' some personal data during
service operation or fault locating. You must define user privacy policies in compliance
with local laws and take proper measures to fully protect personal data.
Feature declaration
− The NetStream feature may be used to analyze the communication information of
terminal customers for network traffic statistics and management purposes. Before
enabling the NetStream feature, ensure that it is performed within the boundaries
permitted by applicable laws and regulations. Effective measures must be taken to
ensure that information is securely protected.
− The mirroring feature may be used to analyze the communication information of
terminal customers for a maintenance purpose. Before enabling the mirroring
function, ensure that it is performed within the boundaries permitted by applicable
laws and regulations. Effective measures must be taken to ensure that information is
securely protected.
− The packet header obtaining feature may be used to collect or store some
communication information about specific customers for transmission fault and
error detection purposes. Huawei cannot offer services to collect or store this
information unilaterally. Before enabling the function, ensure that it is performed
within the boundaries permitted by applicable laws and regulations. Effective
measures must be taken to ensure that information is securely protected.
Reliability design declaration
Network planning and site design must comply with reliability design principles and
provide device- and solution-level protection. Device-level protection includes planning
principles of dual-network and inter-board dual-link to avoid single point or single link
of failure. Solution-level protection refers to a fast convergence mechanism, such as FRR
and VRRP.
Special Declaration
This document serves only as a guide. The content is written based on device
information gathered under lab conditions. The content provided by this document is
intended to be taken as general guidance, and does not cover all scenarios. The content
provided by this document may be different from the information on user device
interfaces due to factors such as version upgrades and differences in device models,
board restrictions, and configuration files. The actual user device information takes
precedence over the content provided by this document. The preceding differences are
beyond the scope of this document.
The maximum values provided in this document are obtained in specific lab
environments (for example, only a certain type of board or protocol is configured on a
tested device). The actually obtained maximum values may be different from the
maximum values provided in this document due to factors such as differences in
hardware configurations and carried services.
Interface numbers used in this document are examples. Use the existing interface
numbers on devices for configuration.
The pictures of hardware in this document are for reference only.
Symbol Conventions
The symbols that may be found in this document are defined as follows.
Symbol Description
Indicates an imminently hazardous situation which, if not
avoided, will result in death or serious injury.
Change History
Updates between document issues are cumulative. Therefore, the latest document issue
contains all updates made in previous issues.
Changes in Issue 03 (2017-09-20)
This issue is the third official release. The software version of this issue is
V800R009C10SPC200.
Changes in Issue 02 (2017-07-30)
This issue is the second official release. The software version of this issue is
V800R009C10SPC100.
Changes in Issue 01 (2017-05-30)
This issue is the first official release. The software version of this issue is
V800R009C10.
Background
IP-based Internet prevailed in the mid 90s. The technology is simple and costs little to deploy.
However, nowadays IP technology, which relies on the longest match algorithm, is not the
most efficient choice for forwarding packets.
In comparison, asynchronous transfer mode (ATM) is much more efficient at forwarding
packets. However, ATM technology is a complex protocol with a high deployment cost, which
has hindered its widespread popularity and growth.
Users wanted a technology that combines the best that both IP and ATM have to offer. The
MPLS technology merges.
Multiprotocol Label Switching (MPLS) is designed to increase forwarding rates. Unlike IP
technology, MPLS analyzes packet headers on the edge of a network, not at each hop.
Therefore, packet processing time is shortened.
MPLS supports multi-layer labels, and its forwarding plane is connection-oriented. MPLS is
widely used in virtual private network (VPN), traffic engineering (TE), and quality of service
(QoS) scenarios.
Overview
MPLS takes place between the data link layer and network layer in the TCP/IP protocol stack.
MPLS supports label switching between multiple network protocols, as implied by its name.
MPLS can use any Layer 2 media to transfer packets, but is not limited by any specific
protocol on the data link layer.
MPLS is derived from the Internet Protocol version 4 (IPv4). The core MPLS technology can
be extended to multiple network protocols, such as the Internet Protocol version 6 (IPv6),
Internet Packet Exchange (IPX), Appletalk, DECnet, and Connectionless Network Protocol
(CLNP). Multiprotocol in MPLS means that the protocol supports multiple network protocols.
The MPLS technology supports multiple protocols and services and improves data
transmission security.
1.12.2.2 Principles
1.12.2.2.1 Concepts
All LSRs on the MPLS network forward data based on labels. When an IP packet enters an
MPLS network, an LER adds a label to it. Before the IP packet leaves the MPLS network,
another LER removes the label.
The path that MPLS packets take on an MPLS network is called a label switched path (LSP).
Label
A label is a 20-bit identifier that uniquely identifies the FEC to which a packet belongs. Upon
receiving an IP packet from a non-MPLS network, the ingress of an LSP creates an MPLS
header in the packet and inserts a specific label into this field, which turns the IP packets into
MPLS packets. A label is only meaningful to a local end. A FEC can be mapped to multiple
incoming labels to balance loads, but a label only represents a single FEC.
Figure 1-1052 illustrates the structure of an MPLS header.
Label Space
Label space is the label value range. The NE20E supports the following label ranges:
special labels. For details about special labels, see Table 1-329.
label space shared by static LSPs and static constraint-based routed label switched paths
(CR-LSPs).
label space used by dynamic signaling protocols, such as Label Distribution Protocol
(LDP), Resource Reservation Protocol-Traffic Engineering (RSVP-TE), and
Multiprotocol Extensions for Border Gateway Protocol (MP-BGP).
Each dynamic signaling protocol uses independent and contiguous values.
0 IPv4 Explicit If the egress receives a packet carrying a label with this
NULL Label value, the egress must remove the label from the packet.
The egress then forwards the packet using IPv4.
1 Router Alert If a node receives a packet carrying label with this value,
Label the node sends the packet to a software module, without
implementing hardware forwarding. The node forwards
the packet based on the next layer label. If the packet
needs to be forwarded using hardware, the node pushes
the Router Alert Label back onto the top of the label
stack before forwarding the packet.
This label takes effect only when it is not at the bottom
of a label stack.
2 IPv6 Explicit If the egress receives a packet carrying a label with this
NULL Label value, the egress removes the label from the packet and
forwards the packet using IPv6.
3 Implicit If the penultimate LSR receives a packet carrying a label
NULL Label with this value, the penultimate LSR removes the label
and forwards the packet (now, an IP or VPN packet) to
the egress. The egress then forwards the packet over IP
or VPN routes.
4 to 13 Reserved N/A
14 OAM Router If the ingress receives a packet carrying a label with this
Alert Label value, the ingress considers it an Operation,
Administration and Maintenance (OAM) packet and
Label Stack
Labels in an MPLS packet can be stacked. The label next to the Layer 2 header is the top or
outer label. The label next to the Layer 3 header is the bottom or inner label. Theoretically,
there is no limitation to the number of MPLS labels that can be stacked. Figure 1-1054
illustrates a label stack.
The labels are processed from the top of the stack based on the last in, first out principle.
Label Operations
The label forwarding table defines the following label operations:
Push: The ingress adds a label into an IP packet between the Layer 2 header and IP
header before forwarding the packet. Within an MPLS network, each LSR adds a new
label to the top of the label stack.
Swap: A transit node replaces a label on the top of the label stack in an MPLS packet
with another label, which is assigned by the next hop.
Pop: A transit LSR or the egress removes the top label from the label stack to decrease
the number of labels in the stack. Either the egress or the penultimate LSR removes a
label from the MPLS packet before the packet leaves an MPLS network.
Label Distribution
An LSR records a mapping between a label and FEC and notifies upstream LSRs of the
mapping. This process is called label distribution.
On the network shown in Figure 1-1056, packets with the destination address 192.168.1.0/24
are assigned to a specific FEC. LSRB and LSRC allocate labels that represent the FEC and
advertise the mapping between labels and the FEC to upstream LSRs.
MPLS Architecture
As shown in Figure 1-1057, the MPLS architecture consists of a control plane and a
forwarding plane.
The control plane is connectionless and is used to distribute labels, create a label forwarding
table, and establish or tear down LSPs.
The forwarding plane, also known as the data plane, is connection oriented. It can apply
services and protocols supported by ATM, and Ethernet. The forwarding plane adds labels to
IP packets and removes labels from MPLS packets. It forwards packets based on the label
forwarding table.
Procedure
MPLS assigns packets to a FEC, distributes labels that identify the FEC, and establishes an
LSP. Packets travel along the LSP.
On the network shown in Figure 1-1058, packets destined for 3.3.3.3 are assigned to a FEC.
Downstream LSRs assign labels for the FEC to upstream LSRs and use a label advertisement
protocol to inform its upstream LSR of the mapping between the labels and the FEC. Each
upstream LSR adds the mapping to a label forwarding table. An LSP is established using the
label mapping information.
LSPs can be either static or dynamic. Static LSPs are established manually. Dynamic LSPs are
established using a routing protocol and a label distribution protocol.
− Bandwidth constraint
− Link colors
− Explicit paths
Multiprotocol Extensions for Border Gateway Protocol (MP-BGP)
MP-BGP is an extension to BGP. MP-BGP defines community attributes. MP-BGP
supports label distribution for packets transmitted over MPLS virtual private network
(VPN) routes and labeled inter-AS VPN routes.
Background
A network with increasing scale and complexity allows for devices of various specifications.
Without packet fragmentation enabled, an MPLS P node transparently transmits packets sent
by the ingress PE to the egress PE. If the MTU configured on the ingress PE is greater than
the MRU configured on the egress PE, the egress PE discards packets with sizes larger than
the MRU.
Principles
In Figure 1-1059, the ingress PE1 has MTU1 greater than MRU2 on the egress PE2. PE2 is
enabled to discard packets with sizes larger than MRU2. Without packet fragmentation
enabled, a P node transparently forwards a packet with a size of LENGTH (MTU1 >
LENGTH > MRU2) to PE2. Since the packet length is greater than MRU2, PE2 discards the
packet. After packet fragmentation is enabled on the P node, the P node fragments the same
packet in to a packet with the size of MTU2 (MTU2 < MRU2) and a packet with a specified
size (LENGTH minus MTU2). If the LENGTH-MTU2 value is greater than MTU2, the
fragment is also fragmented. After the fragments reach PE2, PE2 properly forwards them
because their lengths are less than MRU2.
Background
A static CR-LSP is established using manually configured forwarding and resource
information. Signaling protocols and path calculation are not used during the setup of
CR-LSPs. Setting up a static CR-LSP consumes a few resources because the two ends of the
CR-LSP do not need to exchange MPLS control packets. The static CR-LSP cannot be
adjusted dynamically in a changeable network topology. A static CR-LSP configuration error
may cause protocol packets of different NEs and statuses interfere one another, which
adversely affects services. To address the preceding problem, a device can be enabled to
check source interfaces of static CR-LSPs. With this function configured, the device can only
forward packets if both labels and inbound interfaces are correct.
Principles
In Figure 1-1060, static CR-LSP1 is configured, with PE1 functioning as the ingress, the P as
a transit node, and PE2 as the egress. The P's inbound interface connected to PE1 is Interface1
and the incoming label is Label1. Static CR-LSP2 remains on PE3 that functions as the
ingress of CR-LSP2. The P's inbound interface connected to PE3 is Interface2 and the
incoming label is Label1. If PE3 sends traffic along CR-LSP2 and Interface2 on the P receives
the traffic, the P checks the inbound interface information and finds that the traffic carries
Label2 but the inbound interface is not Interface2. Consequently, the P discards the traffic.
1.12.2.3 Applications
1.12.2.3.1 MPLS-based VPN
A traditional virtual private network (VPN) transmits private network data over a public
network using tunneling protocols, such as the Generic Routing Encapsulation (GRE), Layer
2 Tunneling Protocol (L2TP), and Point to Point Tunneling Protocol (PPTP).
An MPLS-based VPN, which is as secure as Frame Relay networks, does not encapsulate or
encrypt packets; therefore, IP Security (IPsec), GRE, or L2TP tunnels do not need to be
deployed. The MPLS-based VPN helps minimize the network delay time.
The MPLS-based VPN technology can establish LSPs to connect private network branches
within a single VPN and to connect VPNs.
Figure 1-1061 illustrates an MPLS-based VPN. The following devices are deployed on the
MPLS-based VPN:
Customer edge (CE): an edge device on a customer network. The CE can be a router,
switch, or host.
Provider edge (PE): an edge device on a service provider network.
Provider (P): is a backbone device in the service provider network and does not connect
to CEs directly. A P obtains basic MPLS forwarding capabilities but does not maintain
VPN information.
You can also use PBR with LDP fast reroute (FRR) to divert some traffic to a backup LSP to
balance traffic between the primary and backup LSP may be idle relatively.
Definition
The Label Distribution Protocol (LDP) is a Multiprotocol Label Switching (MPLS) control
protocol. It classifies forwarding equivalence classes (FECs), distributes labels, and
establishes and maintains label switched paths (LSPs). LDP defines messages in the label
distribution process as well as procedures for processing these messages.
Purpose
On an MPLS network, LDP distributes label mappings and establishes LSPs. LDP sends
multicast Hello messages to discover local peers and sets up local peer relationships.
Alternatively, LDP sends unicast Hello messages to discover remote peers and sets up remote
peer relationships.
Two LDP peers establish a TCP connection, negotiate LDP parameters over the TCP
connection, and establish an LDP session. They exchange messages over the LDP session to
set up an LSP. LDP networking is simple to construct and configure, and LDP establishes
LSPs using routing information.
LDP applications are as follows:
LDP LSPs guide IP data across a full-mesh MPLS network, over which a Border
Gateway Protocol-free (BGP-free) core network can be built.
LDP works with BGP to establish end-to-end inter-autonomous system (inter-AS) or
inter-carrier tunnels to transmit Layer 3 virtual private network (L3VPN) services.
LDP over traffic engineering (TE) combines LDP and TE advantages to establish
end-to-end tunnels to transmit virtual private network (VPN) services.
1.12.3.2 Principles
1.12.3.2.1 Basic Concepts
The MPLS architecture consists of multiple label distribution protocols, in which LDP is
widely used.
LDP defines messages in the label distribution process and procedures for processing the
messages. Label switching routers (LSRs) obtain information about incoming labels, next-hop
nodes, and outgoing labels for specified FECs based on the local forwarding table. LSRs use
the information to establish LSPs.
For detailed information about LDP, see relevant standards (LDP Specification).
LDP Adjacency
When an LSR receives a Hello message from a peer, the LSR establishes an LDP adjacency
with the peer may exist. An LDP adjacency maintains a peer relationship between the two
LSRs. There are two types of LDP adjacencies:
Local adjacency: established by exchanging Link Hello messages between two LSRs.
Remote adjacency: established by exchanging Target Hello messages between two LSRs.
LDP Peers
Two LDP peers set up LDP sessions and exchange Label Mapping messages over the session
so that they establish an LSP.
LDP Session
An LDP session between LSRs helps them exchange messages, such as Label Mapping
messages and Label Release messages. LDP sessions are classified into the following types:
Local LDP session: set up over a local adjacency. The two LSRs, one on each end of the
local LDP session, are directly connected.
Remote LDP session: set up over a remote adjacency. The two LSRs, one on each end of
the remote LDP session, can be either directly or indirectly connected.
LDP Messages
Two LSRs exchange the following messages:
Discovery message: used to notify or maintain the presence of an LSR on an MPLS
network.
Session message: used to establish, maintain, or terminate an LDP session between LDP
peers.
Advertisement message: used to create, modify, or delete a mapping between a specific
FEC and label.
Notification message: used to provide advisory information or error information.
LDP transmits Discovery messages using the User Datagram Protocol (UDP) and transmits
Session, Advertisement, and Notification messages using the Transmission Control Protocol
(TCP).
an LDP identifier and other information, such as the hello-hold time and transport
address. If an LSR receives a Targeted Hello message, the LSR has a potential LDP peer.
After both LSRA and LSRB have accepted each other's Keepalive messages, the LDP session
is successfully established.
If a DU LDP session is established between an LSR and its peer, a liberal LSP is
established. This liberal LSP cannot function as a backup LSP after LDP FRR is enabled.
If a DoD LDP session is established between an LSR and its peer, the LSR sends a
Release message to tear down label-based bindings.
An LSP is established along the path LSA -> LSRB -> LSRC. LSRC functions as a proxy
egress and extends the LSP to LSRD. The extended LSP is a proxy egress LSP.
Background
If a direct link for a local LDP session fails, the LDP adjacency is torn down, and the session
and labels are deleted. After the direct link recovers, the local LDP session is reestablished
and distributes labels so that an LSP can be reestablished over the session. Before the LSP is
reestablished, however, LDP LSP traffic is dropped.
To speed up LDP LSP convergence and minimize packet loss, the NE20E implements LDP
session protection.
LDP session protection helps maintain an LDP session, eliminating the need to reestablish an
LDP session or re-distribute labels.
Principles
In Figure 1-1067, LDP session protection is configured on the nodes at both ends of a link.
The two nodes exchange Link Hello messages to establish a local LDP session and exchange
Targeted Hello messages to establish a remote LDP session, forming a backup relationship
between the remote LDP session and local LDP session.
In Figure 1-1067, if the direct link between LSRA and LSRB fails, the adjacency established
using Link Hello messages is torn down. Because the indirectly connected link is working
properly, the remote adjacency established using Targeted Hello messages remains. Therefore,
the LDP session is maintained by the remote adjacency, and the mapping between FECs and
labels for the session also remains. After the direct link recovers, the local LDP session can
rapidly restore LSP information. There is no need to reestablish the LDP session or
re-distribute labels, which minimizes the time required for LDP session convergence.
Background
On an MPLS network with both active and standby links, if an active link fails, IGP routes
re-converge, and the IGP route of the standby link becomes reachable. An LDP LSP over the
standby link is then established. During this process, some traffic is lost. To minimize traffic
loss, LDP Auto FRR is used.
On the network enabled with LDP Auto FRR, if an interface failure (detected by the interface
itself or by an associated BFD session) or a primary LSP failure (detected by an associated
BFD session) occurs, LDP FRR is notified of the failure and rapidly forwards traffic to a
backup LSP, protecting traffic on the primary LSP. The traffic switchover minimizes the
traffic interruption time.
Implementation
LDP LFA FRR
LDP LFA FRR is implemented based on IGP LFA FRR's LDP Auto FRR. LDP LFA FRR uses
the liberal label retention mode to obtain a liberal label, applies for a forwarding entry
associated with the label, and forwards the forwarding entry to the forwarding plane as a
backup forwarding entry to be used by the primary LSP. If an interface detects a failure of its
own, bidirectional forwarding detection (BFD) detects an interface failure, or BFD detects a
primary LSP failure, LDP LFA FRR rapidly switches traffic to a backup LSP to protect traffic
on the primary LSP.
Figure 1-1068 Typical usage scenario for LDP Auto FRR (triangle topology)
Figure 1-1068 shows a typical usage scenario for LDP Auto FRR. The preferred
LSRA-to-LSRB route is LSRA-LSRB and the second optimal route is LSRA-LSRC-LSRB. A
primary LSP between LSRA and LSRB is established on LSRA, and a backup LSP of
LSRA-LSRC-LSRB is established to protect the primary LSP. After receiving a label from
LSRC, LSRA compares the label with the LSRA-to-LSRB route. Because the next hop of the
LSRA-to-LSRB route is not LSRC, LSRA preserves the label as a liberal label.
If the backup route corresponding to the source of the liberal label for LDP auto FRR exists,
and its destination meets the policy for LDP to create a backup LSP. LSRA can apply a
forwarding entry for the liberal label, establish a backup LSP as the backup forwarding entry
of the primary LSP, and send the entries mapped to both the primary and backup LSPs to the
forwarding plane. In this way, the primary LSP is associated with the backup LSP.
LDP Auto FRR is triggered when the interface detects faults by itself, BFD detects faults in
the interface, or BFD detects a primary LSP failure. After LSP FRR is complete, traffic is
switched to the backup LSP based on the backup forwarding entry. Then, the route is
converged to LSRA-LSRC-LSRB. An LSP is established on the new LSP (the original backup
LSP), and the original primary LSP is torn down, and then the traffic is forwarded along the
new LSP over the path LSRA-LSRC-LSRB.
LDP Remote LFA FRR
LDP LFA FRR cannot calculate backup paths on large networks, especially ring networks,
which fails to meet reliability requirements. To address this issue, LDP Remote LFA FRR is
used. Remote LFA FRR is implemented based on IGP Remote LFA FRR's (1.10.8.2.13 IS-IS
Auto FRR) LDP Auto FRR. Figure 1-1069 illustrates the typical LDP Auto FRR usage
scenario. The primary LDP LSP is established over the path PE1 -> PE2. Remote LFA FRR
establishes a Remote LFA FRR LSP over the path PE1 -> P2 -> PE2 to protect the primary
LDP LSP.
Figure 1-1069 Typical LDP Auto FRR usage scenario - ring topology
Background
LDP-IGP synchronization enables the LDP status and the IGP status to go Up simultaneously,
which helps minimize traffic interruption time if a fault occurs.
A network provides active and standby links for redundancy. If the active link fails, both an
IGP route and an LDP LSP switch from the active link to the standby link. After the active
link recovers, the IGP route switches back to the active link earlier than the LDP LSP. Traffic
therefore switches to the IGP route over the active link but is dropped because the LSP is
unreachable over the new active link. To prevent traffic loss, LDP-IGP synchronization can be
configured.
On a network enabled with LDP-IGP synchronization, an IGP keeps advertising the maximum
cost of an IGP route over the new active link to delay IGP route convergence until LDP
converges. Traffic keeps switching back and forth between the standby and active links. The
backup LSP is torn down after the LSP on the active link is established.
LDP-IGP synchronization involves the following timers:
Hold-max-cost timer
Delay timer
Implementation
In Figure 1-1070, a network has both an active and standby link. When the active link
recovers from any fault, traffic is switched from the standby link to the active link.
During the traffic switchback, the backup LSP cannot be used, and a new LSP cannot be
set up over the active link once IGP route convergence is complete. This causes a traffic
interruption for a short period of time. To help prevent this problem, LDP-IGP
synchronization can be configured to delay the IGP route switchback until LDP
converges. The backup LSP is not deleted and continues forwarding traffic until an LSP
over the active link is established. The process of LDP-IGP synchronization is as
follows:
a. A link recovers from a fault.
b. An LDP session is set up between LSR2 and LSR3. The IGP advertises the
maximum cost of the active link to delay the IGP route switchback.
c. Traffic is still forwarded along the backup LSP.
d. Once set up, the LDP session transmits Label Mapping messages and advertises the
IGP to start synchronization.
e. The IGP advertises the normal cost of the active link, and its routes converge on
the original forwarding path. The LSP is reestablished and delivers entries to the
forwarding table.
When OSPF is used, the status transits based on the flowchart shown in Figure 1-1071.
When IS-IS is used, the Hold-normal-cost state does not exist. After the Hold-max-cost timer expires,
IS-IS advertises the actual link cost, but the Hold-max-cost state is displayed even though this state
is nonexistent.
Usage Scenario
Figure 1-1072 shows an LDP-IGP synchronization scenario.
On the network shown in Figure 1-1072, an active link and a standby link are established.
LDP-IGP synchronization and LDP FRR are deployed.
Benefits
Packet loss is reduced during an active/standby link switchover, improving network
reliability.
1.12.3.2.9 LDP GR
LDP supports graceful restart (GR) that enables a Restarter, together with a Helper, to perform
a master/backup switchover or protocol restart, without interrupting traffic.
Background
If a node or link along an LDP LSP that is transmitting traffic fails, traffic switches to a
backup LSP. The path switchover speed depends on the detection duration and traffic
switchover duration. A delayed path switchover causes traffic loss. LDP fast reroute (FRR)
can be used to speed up the traffic switchover, but not the detection process.
As shown in Figure 1-1074, a local label switching router (LSR) periodically sends Hello
messages to notify each peer LSR of the local LSR's presence and establish a Hello adjacency
with each peer LSR. The local LSR constructs a Hello hold timer to maintain the Hello
adjacency with each peer. Each time the local LSR receives a Hello message, it updates the
Hello hold timer. If the Hello hold timer expires before a Hello message arrives, the LSR
considers the Hello adjacency disconnected. The Hello mechanism cannot rapidly detect link
faults, especially when a Layer 2 device is deployed between the local LSR and its peer.
The rapid, light-load BFD mechanism is used to quickly detect faults and trigger a
primary/backup LSP switchover, which minimizes data loss and improves service reliability.
BFD for LDP LSP is implemented by establishing a BFD session between two nodes on both
ends of an LSP and binding the session to the LSP. BFD rapidly detects LSP faults and
triggers a traffic switchover. When BFD monitors a unidirectional LDP LSP, the reverse path
of the LDP LSP can be an IP link, an LDP LSP, or a traffic engineering (TE) tunnel.
A BFD session that monitors LDP LSPs is negotiated in either static or dynamic mode:
Static configuration: The negotiation of a BFD session is performed using the local and
remote discriminators that are manually configured for the BFD session to be established.
On a local LSR, you can bind an LSP with a specified next-hop IP address to a BFD
session with a specified peer IP address.
Dynamic establishment: The negotiation of a BFD session is performed using the BFD
discriminator type-length-value (TLV) in an LSP ping packet. You must specify a policy
for establishing BFD sessions on a local LSR. The LSR automatically establishes BFD
sessions with its peers and binds the BFD sessions to LSPs using either of the following
policies:
− Host address-based policy: The local LSR uses all host addresses to establish BFD
sessions. You can specify a next-hop IP address and an outbound interface name of
LSPs and establish BFD sessions to monitor the specified LSPs.
− Forwarding equivalence class (FEC)-based policy: The local LSR uses host
addresses listed in a configured FEC list to automatically establish BFD sessions.
BFD uses the asynchronous mode to check LSP continuity. That is, the ingress and egress
periodically send BFD packets to each other. If one end does not receive BFD packets from
the other end within a detection period, BFD considers the LSP Down and sends an LSP
Down message to the LSP management (LSPM) module.
Although BFD for LDP is enabled on a proxy egress, a BFD session cannot be established for the
reverse path of a proxy egress LSP on the proxy egress.
Usage Scenarios
BFD for LDP LSP can be used in the following scenarios:
Primary and bypass LDP FRR LSPs are established.
Primary and bypass virtual private network (VPN) FRR LSPs are established.
Benefits
BFD for LDP LSP provides a rapid, light-load fault detection mechanism for LDP LSPs,
which improves network reliability.
Background
As mobile services evolve from narrowband voice services to integrated broadband services,
providing rich voice, streaming media, and high speed downlink packet access (HSDPA)
services, the demand for network bandwidth is rapidly increasing. Meeting the bandwidth
demand on traditional bearer networks requires huge investments. Therefore, carriers are in
urgent need of an access mode that is low cost, flexible, and highly efficient, which can help
them meet the challenges brought by the growth in wideband services. In this context, the
all-IP mobile bearer networks are an effective means of dealing with these issues. IP radio
access networks (RANs), a type of IP-based mobile bearer network, are increasingly widely
used.
IP RANs, however, have more complex reliability requirements than traditional bearer
networks when carrying broadband services. Traditional fault detection mechanisms cannot
trigger protection switching based on random bit errors. Therefore, bit errors may degrade or
even interrupt services on an IP RAN in extreme cases. Bit-error-triggered protection
switching can solve this problem.
Benefits
Bit-error-triggered LDP protection switching has the following benefits:
Protects traffic from random bit errors, improving service quality.
Enables devices to record bit error events, enabling carriers to quickly locate the nodes
or lines with bit errors and take corrective measures.
Related Concepts
LDP interface bit error rate
LDP interface bit error rate is the bit error rate detected by LDP on an interface. A node uses a
Link Hello message to report its LDP interface bit error rate to an upstream LDP peer.
LSP bit error rate
LSP bit error rate on a node = LSP bit error rate reported by the downstream LDP peer + the
LDP interface bit error rate reported by the downstream LDP peer.
Implementation
The NE20E supports single-node and multi-node LDP bit error detection and calculation.
When LDP detects an interface bit error on a node along an LSP, the node sends a Link Hello
message to notify its upstream LDP peer of the interface bit error rate and a Label Mapping
message to notify its upstream LDP peer of the LSP bit error rate. Upon receipt of the
notifications, the upstream LDP peer uses the received interface bit error rate as the local LDP
interface bit error rate, adds the LDP interface bit error rate to the received LSP bit error rate
to obtain the local LSP bit error rate, and sends the interface bit error rate and local LSP bit
error rate to its upstream LDP peer. This process repeats until the ingress of the LSP calculates
its local LSP bit error rate. Figure 1-1075 illustrates the networking for bit-error-triggered
LDP protection switching.
In Figure 1-1075, an LSP is established between PE1 and PE2. If if1 and if3 interfaces both
detect bit errors, the bit errors along the LSP to the ingress are advertised and calculated as
described by the text in Figure 1-1075.
LDP only detects and transmits bit errors, and service switching such as in PW switching or L3VPN
route switching occurs on paths carried over LDP.
spoofing. The MD5 message digest is a unique result calculated by an irreversible character
string conversion. If a message is modified during transmission, a different digest is generated.
After the message arrives at the receive end, the receive end can determine whether the packet
is modified by comparing the received digest with the pre-computed digest.
LDP MD5 authentication prevents LDP packets from being modified by generating a unique
digest for an information segment. This authentication mode is stricter than common
checksum verification for TCP connections.
Before an LDP message is sent over a TCP connection, LDP MD5 authentication is performed
by padding the TCP header with a unique digest. This digest is a result calculated by MD5
based on the TCP header, LDP message, and password set by the user.
When receiving this TCP packet, the receiver obtains the TCP header, digest, and LDP
message, and then uses MD5 to calculate a digest based on the received TCP header, received
LDP message, and locally stored password. The receiver compares the calculated digest with
the received one to check whether the packet is modified.
A password can be set in either ciphertext or simple text. The simple password is directly
recorded in the configuration file. The ciphertext password is recorded in the configuration
file after being encrypted using a special algorithm.
During the calculation of a digest, the manually entered character string is used regardless of
whether the password is in simple text or ciphertext. This means that a password in ciphertext
does not participate in MD5 calculation.
Principles
LDP over TE establishes LDP LSPs across RSVP-TE areas. RSVP-TE is an MPLS tunnel
technique used to generate LSPs as tunnels for other protocols to transparently transmit
packets. LDP is another MPLS tunnel technique used to generate LDP LSPs. LDP over TE
allows an LDP LSP to span an RSVP-TE area so that a TE tunnel functions as a hop along an
LDP LSP.
After an RSVP-TE tunnel is established, an IGP (OSPF or IS-IS) locally computes routes or
advertises link state advertisements (LSAs) or link state PDUs (LSPs) to select a TE tunnel
interface as the outbound interface. In the following example, the original router is directly
connected to the destination router of the TE tunnel through logical interfaces. Packets are
transparently transmitted along the TE tunnel.
In Figure 1-1076, P1, P2, and P3 belong to an RSVP-TE domain. PE1 and PE2 are located in
a VPN, and LDP sessions between PE1 and P1 and between P3 and PE2 are established. The
following example demonstrates the process of establishing an LDP LSP between PE1 and
PE2 over the RSVP-TE domain:
1. An RSVP-TE tunnel between P1 and P3 is set up. P3 assigns RSVP-Label-1 to P2, and
P2 assigns RSVP-Label-2 to P1.
2. PE2 initiates LDP to set up an LSP and sends a Label Mapping message carrying
LDP-Label-1 to P3.
3. Upon receipt, P3 sends a Label Mapping message carrying LDP-Label-2 to P1 over a
remote LDP session.
4. Upon receipt, P1 sends a Label Mapping message carrying LDP-Label-3 to PE1.
Usage Scenario
LDP over TE is used to transmit VPN services. Because carriers have difficulties in deploying
MPLS traffic engineering on an entire network, they use LDP over TE to plan a core TE area
and implement LDP outside this area. Figure 1-1077 illustrates an LDP over TE network.
The advantage of LDP over TE is that an LDP LSP is easier to operate and maintain than a TE
tunnel, and the resource consumption of LDP is lower than that of the RSVP soft state. On an
LDP over TE network, TE tunnels are deployed only in the core area, but not on all devices
including PEs. This simplifies deployment and maintenance on the entire network and
relieves burden from PEs. In addition, the core area can take full advantage of TE tunnels to
perform protection switchovers, path planning, and bandwidth protection.
Principles
LDP GTSM implements GTSM implementation over LDP.
To protect the router against attacks, GTSM checks the TTL in each packet to verify it. GTSM
for LDP verifies LDP packets exchanged between neighbor or adjacent (based on a fixed
number of hops) routers. The TTL range is configured on each router for packets from other
routers, and GTSM is enabled. If the TTL of an LDP packet received by a router configured
with LDP is out of the TTL range, the packet is considered invalid and discarded. Therefore,
the upper layer protocols are protected.
Usage Scenario
GTSM is used to protect the TCP/IP-based control plane against CPU usage attacks, for
example, CPU overload attacks. GTSM for LDP is used to verify all LDP packets to prevent
LDP from suffering CPU-based attacks when LDP receives and processes a large number of
forged packets.
In Figure 1-1078, LSR1 through LSR5 are core routers on the backbone network. When
LSRA is connected to the router through another device, LSRA may initiate an attack by
forging LDP packets that are transmitted among LSR 1 to LSR 5.
After LSRA accesses the backbone network through another device and forges a packet, the
TTL carried in the forged packet cannot be forged.
A GTSM policy is configured on LSR1 through LSR5 separately and is used to verify packets
reaching possible neighbors. For example, on LSR5, the valid number of hops is set to 1 or 2,
and the valid TTL is set to 254 or 255 for packets sent from LSR2. The forged packet sent by
LSRA to LSR5 through multiple intermediate devices contains a TTL value that is out of the
preset TTL range. LSR5 discards the forged packet and prevents the attack.
Principles
The local and remote LDP adjacencies can be connected to the same peer so that the peer is
maintained by both the local and remote LDP adjacencies.
On the network shown in Figure 1-1079, when the local LDP adjacency is deleted due to a
failure in the link to which the adjacency is connected, the peer's type may change without
affecting its presence or status. (The peer type is determined by the adjacency type. The types
of adjacencies can be local, remote, and coexistent local and remote.)
If the link becomes faulty or is recovering from a fault, the peer type may change while the
type of the session associated with the peer changes. However, the session is not deleted and
does not become Down. Instead, the session remains Up.
Usage Scenario
Figure 1-1079 Networking diagram for a coexistent local and remote LDP session
A coexistent local and remote LDP session typically applies to L2VPNs. On the network
shown in Figure 1-1079, L2VPN services are transmitted between PE1 and PE2. When the
directly connected link between PE1 and PE2 recovers from a disconnection, the processing
of a coexistent local and remote LDP session is as follows:
1. MPLS LDP is enabled on the directly connected PE1 and PE2, and a local LDP session
is set up between PE1 and PE2. PE1 and PE2 are configured as the remote peer of each
other, and a remote LDP session is set up between PE1 and PE2. Local and remote
adjacencies are then set up between PE1 and PE2. Since now, both local and remote
LDP sessions exist between PE1 and PE2. L2VPN signaling messages are transmitted
through the compatible local and remote LDP session.
2. When the physical link between PE1 and PE2 becomes Down, the local LDP adjacency
also goes Down. The route between PE1 and PE2 is still reachable through the P, which
means that the remote LDP adjacency remains Up. The session changes to a remote
session so that it can remain Up. The L2VPN does not detect the change in session status
and does not delete the session. This prevents the L2VPN from having to disconnect and
recover services, and shortens service interruption time.
3. When the fault is rectified, the link between PE1 and PE2 as well as the local LDP
adjacency can go Up again. The session changes to the compatible local and remote LDP
session and remains Up. Again, the L2VPN will not detect the change in session status
and does not delete the session. This reduces service interruption time.
Figure 1-1080 Networking diagram for both upstream and downstream LSRs assigned labels by
LDP
In addition, split horizon can be configured to have Label Mapping messages only sent to
specified upstream LSRs.
1.12.3.2.18 mLDP
The multipoint extensions for Label Distribution Protocol (mLDP) transmits multicast
services over IP or Multiprotocol Label Switching (MPLS) backbone networks, which
simplifies network deployment.
Background
Traditional core and backbone networks run IP and MPLS to flexibly transmit unicast packets
and provide high reliability and traffic engineering (TE) capabilities.
The proliferation of applications, such as IPTV, multimedia conference, and massively
multiplayer online role-playing games (MMORPGs), amplifies demands on multicast
transmission over IP/MPLS networks. The existing P2P MPLS technology requires a transmit
end to deliver the same data packet to each receive end, which wastes network bandwidth
resources.
The point-to-multipoint (P2MP) Label Distribution Protocol (LDP) technique defined in
mLDP can be used to address the preceding problem. P2MP LDP extends the MPLS LDP
protocol to meet P2MP transmission requirements and uses bandwidth resources much more
efficiently.
Figure 1-1081 shows the P2MP LDP LSP networking. A tree-shaped LSP originates at the
ingress PE1 and is destined for egresses PE3, PE4, and PE5. The ingress directs multicast
traffic into the LSP. The ingress sends a single packet along the trunk to the branch node P4.
P4 replicates the packet and forwards the packet to its connected egresses. This process
prevents duplicate packets from wasting trunk bandwidth.
Related Concepts
Table 1-330 describes the nodes used on the P2MP LDP network shown in Figure 1-1081.
Root node An ingress on a P2MP LDP LSP. The ingress initiates PE1
LSP calculation and establishment. The ingress pushes a
label into each multicast packet before forwarding it along
an established LSP.
Transit An intermediate node that swaps an incoming label for an P1 and P3
node outgoing label in each MPLS packet. A branch node may
function as a transit node.
Leaf node A destination node on a P2MP LDP LSP. PE3, PE4, and
PE5
Bud node An egress of a sub-LSP and transit node of other PE2
sub-LSPs. The bud node is connected to a customer edge
(CE) and is functioning as an egress.
Branch A node from which LSP branches (sub-LSP) start. P4
node A branch node replicates packets and swaps an incoming
label for an outgoing label in each packet before
forwarding it to each leaf node.
Implementation
The procedure for using mLDP to establish and maintain a P2MP LDP LSP is as follows:
Nodes negotiate the P2MP LDP capability with each other.
mLDP enables a node to negotiate the P2MP LDP capability with a peer node and
establish an mLDP session with the peer node.
A P2MP LDP LSP is established.
Each leaf and transit node sends a Label Mapping message upstream until the root node
receives a Label Mapping message downstream. The root node then establishes a P2MP
LDP LSP with sub-LSPs that are destined for leaf nodes.
A node deletes a P2MP LDP LSP.
A node of a specific type uses a specific rule to delete an LSP, which minimizes the
service interruptions.
The P2MP LDP LSP updates.
If the network topology or link cost changes, the P2MP LDP LSP updates automatically
based on a specified rule, which ensures uninterrupted service transmission.
As shown in Figure 1-1083, P2MP LDP-enabled label switching routers (LSRs) exchange
signaling messages to negotiate mLDP sessions. Two LSRs can successfully negotiate an
mLDP session only if both the LDP Initialization messages carry the P2MP Capability TLV.
After successful negotiation, an mLDP session is established. The mLDP session
establishment process is similar to the LDP session establishment process. The difference is
that the mLDP session establishment involves P2MP capability negotiation.
The P2MP LDP LSP establishment mode varies depending on the node type. A P2MP LDP
LSP contains the following nodes:
Leaf node: manually specified. When configuring a leaf node, you must also specify the
root node IP address and the opaque value.
Transit node: any node that can receive P2MP Label Mapping messages and whose LSR
ID is different from the LSR IDs of the root nodes.
Root node: a node whose host address is the same as the root node's IP address carried in
a P2MP LDP FEC.
The process for establishing a P2MP LDP LSP is as follows:
A leaf node sends a Label Withdraw message to an upstream node. After the upstream
node receives the message, it replies with a Label Release message to instruct the leaf
node to tear down the sub-LSP. If the upstream node has only the leaf node as a
downstream node, the upstream node sends the Label Withdraw message to its upstream
node. If the upstream node has another downstream node, the upstream node does not
send the Label Withdraw message.
Transit node
If a transit node or an LDP session between a transit node and its upstream node fails or
a user manually deletes the transit node configuration, the upstream node of the transit
node deletes the sub-LSPs that pass through the transit node. If the upstream node has
another downstream node, the upstream node does not send the Label Withdraw message.
If the upstream node has only the leaf node as a downstream node, the upstream node
sends the Label Withdraw message to its upstream node.
Root node
If a root node fails or a user manually deletes the LSP configuration on the root node, the
root node deletes the whole LSP.
Other Usage
mLDP P2MP LSPs can transmit services on next generation (NG) multicast VPN (MVPN)
and multicast VPLS networks. In the MVPN or multicast VPLS scenario, NG MVPN
signaling or multicast VPLS signaling triggers the establishment of mLDP P2MP LSPs. There
is no need to manually configure leaf nodes.
Usage Scenarios
mLDP can be used in the following scenarios:
IPTV services are transmitted over an IP/MPLS backbone network.
Multicast virtual private network (VPN) services are transmitted.
The virtual private LAN service (VPLS) is transmitted along a P2MP LDP LSP.
Benefits
mLDP used on an IP/MPLS backbone network offers the following benefits:
Core nodes on the IP/MPLS backbone network can transmit multicast services, without
Protocol Independent Multicast (PIM) configured, which simplifies network deployment.
Uniform MPLS control and forwarding planes are provided for the IP/MPLS backbone
network. The IP/MPLS backbone network can transmit both unicast and multicast VPN
traffic.
Implementation
LDP traffic statistics collection enables the ingress or a transit node to collect statistics only
about outgoing LDP LSP traffic with the destination IP address mask of 32 bits.
In Figure 1-1087, each pair of adjacent devices establishes an LDP session and LDP LSP over
the session. Two LSPs originate from LSRA and are destined for LSRD along the paths LSRA
-> LSRB -> LSRD and LSRA -> LSRB -> LSRC -> LSRD. LSRB is used as an example.
LSRB functions as either a transit node to forward LSRA-to-LSRD traffic or the ingress to
forward LSRB-to-LSRD traffic. LSRB collects statistics about traffic sent by the outbound
interface connected to LSRD and outbound interface connected to LSRC. LSRA can only
function as the ingress, and therefore, collects statistics about traffic only sent by itself. LSRD
can only function as the egress, and therefore, does not collect traffic statistics.
Benefits
No tunnel protection is provided in the NG-MVPN over mLDP P2MP function or VPLS over
mLDP P2MP function. If an LSP fails, traffic can only be switched using route
change-induced hard convergence, which renders low performance. BFD for P2MP tunnel
provides a dual-root mLDP 1+1 protection mechanism for the NG-MVPN over mLDP P2MP
function or VPLS over mLDP P2MP function. The primary and backup tunnels are
established for VPN traffic. If a P2MP tunnel fails, BFD For mLDP P2MP tunnel rapidly
detects the fault and switches traffic, which improves convergence performance for the
NG-MVPN over mLDP P2MP function or VPLS over mLDP P2MP function and minimizes
traffic loss.
Principles
In Figure 1-1088, a root uses BFD to send protocol packets to all leaf nodes along a P2MP
LDP LSP. If a leaf node fails to receive BFD packets within a specified period, a fault occurs.
In an NG-MVPN or VPLS scenario shown in Figure 1-1088, each of two roots establishes an
mLDP P2MP tree. PE-AGG1 is the master root, and PE-AGG2 is the backup root. The two
trees do not overlap. BFD for P2MP tunnel is configured on the roots and leaf nodes to
establish BFD sessions. If a BFD session detects a fault in the primary P2MP tunnel, a
forwarder rapidly detects the fault and switches NG-MVPN or VPLS traffic to the backup
P2MP tunnel.
Principles
In a large-scale network, multiple IGP areas usually need to be configured for flexible
network deployment and fast route convergence. When advertising routes between IGP areas,
to prevent a large number of routes from consuming too many resources, an area border router
(ABR) needs to aggregate the routes in the area and then advertise the aggregated route to the
neighbor IGP areas. The LDP extension for inter-area LSP function supports the longest
match rule for looking up routes so that LDP can use aggregated routes to establish inter-area
LDP LSPs.
Figure 1-1089 Networking topology for LDP extension for inter-area LSP
As shown in Figure 1-1089, there are two IGP areas: Area 10 and Area 20.
In the routing table of LSRD on the edge of Area 10, there are two host routes to LSRB and
LSRC. You can use IS-IS to aggregate the two routes to one route to 1.3.0.0/24 and send this
route to Area 20 in order to prevent a large number of routes from occupying too many
resources on the LSRD. Consequently, there is only one aggregated route (1.3.0.0/24) but not
32-bit host routes in LSRA's routing table. By default, when establishing LSPs, LDP searches
the routing table for the route that exactly matches the forwarding equivalence class (FEC) in
the received Label Mapping message.Figure 1-1089shows routing entry information of LSRA
and routing information carried in the FEC in the example shown inTable 1-332.
Table 1-332 Routing entry information of LSRA and routing information carried in the FEC
Routing Entry Information FEC
of LSRA
1.3.0.0/24 1.3.0.1/32
1.3.0.2/32
LDP establishes liberal LSPs, not inter-area LDP LSPs, for aggregated routes. In this
situation, LDP cannot provide required backbone network tunnels for VPN services.
Therefore, in the situation shown in Figure 1-1089, configure LDP to search for routes based
on the longest match rule for establishing LSPs. There is already an aggregated route to
1.3.0.0/24 in the routing table of LSRA. When LSRA receives a Label Mapping message
(such as the carried FEC is 1.3.0.1/32) from Area 10, LSRA searches for a route according to
the longest match rule defined in relevant standards. Then, LSRA finds information about the
aggregated route to 1.3.0.0/24, and uses the outbound interface and next hop of this route as
those of the route to 1.3.0.1/32. LDP can establish inter-area LDP LSPs.
In Figure 1-1090, no exact routes between LSRA and LSRC are configured. The default
LSRA-to-LSRB route to 0.0.0.0 is used between LSRA and LSRC. A remote LDP session in
DoD mode is established between LSRA and LSRC. Before an LSP is established between
the two LSRs, LSRA uses the longest match rule to query the next-hop IP address and sends a
Label Request packet to the downstream LSR. Upon receipt of the Label Request packet, the
transit LSRB checks whether an exact route to LSRC exists. If no exact route is configured
and the longest match function is enabled, LSRB uses the longest match function to find a
route and establish an LSP over the route.
A remote LDP session in DoD mode is established on LSRA and LSRA does not find an exact
route to the remote IP address. In this situation, after the IP address of a remote peer is
specified on LSRA, LSRA uses the longest match function to automatically send a Label
Request packet to request a DoD label to the remote peer that is assigned an IP address.
1.12.3.3 Applications
1.12.3.3.1 mLDP Applications in an IPTV Scenario
Service Overview
The IP or Multiprotocol Label Switching (MPLS) technology has become a mainstream
bearer technology on backbone networks, and the demands for multicast services (for
example, IPTV) transmitted over bearer networks are evolving. Carriers draw on the existing
MPLS mLDP technique to provide the uniform MPLS control and forwarding planes for
multicast services transmitted over backbone networks.
Networking Description
mLDP is deployed on IP/MPLS backbone networks. Figure 1-1091 illustrates mLDP
applications in an IPTV scenario.
Feature Deployment
The procedure for deploying end-to-end (E2E) IP multicast services to be transmitted along
mLDP label switched paths (LSPs) is as follows:
Establish an mLDP LSP.
Perform the following steps:
a. Plan the root, transit, and leaf nodes on an mLDP LSP.
b. Configure leaf nodes to send requests to the root node to establish
point-to-multipoint (P2MP) LDP LSPs.
c. Configure a virtual tunnel interface and bind the LSP to it.
Import multicast services into the LSP.
Configure the quality of service (QoS) redirection function on the ingress PE1 to direct
data packets sent by a multicast source to the specified mLDP LSP.
Forward multicast services.
To enable the egresses (PE2 and PE3) to forward multicast services, perform the
following operations:
− Configure the egresses to run Protocol Independent Multicast (PIM) to generate
multicast forwarding entries.
− Enable the egresses to ignore the Unicast Reverse Path Forwarding (URPF) check.
This is because the URPF check fails as PIM does not need to be run on core nodes
on the P2MP LDP network.
− Enable multicast source proxy based on the location of the Rendezvous Point (RP).
After multicast data packets for a multicast group in an any-source multicast (ASM)
address range are directed to an egress, the egress checks the packets based on
unicast routes. Multicast source proxy is enabled or disabled based on the following
check results:
If the egress is indirectly connected to a multicast source and does not function
as the RP to which the group corresponds, the egress stops forwarding
multicast data packets. As a result, downstream hosts cannot receive these
multicast data packets. Multicast source proxy can be used to address this
problem. Multicast source proxy enables the egress to send a Register message
to the RP deployed on a source-side device (for example, SR1) in a PIM
domain. The RP adds the egress to a rendezvous point tree (RPT) to enable the
egress to forward multicast data packets to the downstream hosts.
If the egress is directly connected to a multicast source or functions as the RP
to which the group corresponds, the egress can forward multicast data packets,
without multicast source proxy enabled.
1.12.4 MPLS TE
1.12.4.1 Introduction
Multiprotocol Label Switching (MPLS) traffic engineering (TE) effectively schedules,
allocates, and uses existing network resources to provide sufficient bandwidth and support for
quality of service (QoS). MPLS TE helps carriers minimize expenditures without requiring
hardware upgrades. TE is implemented based on MPLS techniques and is easy to deploy and
maintain on live networks. MPLS TE supports a range of reliability techniques, which helps
backbone networks achieve carrier- and device-class reliability.
Purpose
Traffic engineering techniques are common for carriers operating IP/MPLS bearer networks.
These techniques are used to prevent traffic congestion and uneven resource allocation.
A node on a conventional IP network selects the shortest path as an optimal route, regardless
of other factors, for example, bandwidth. The shortest path may be congested with traffic,
whereas other available paths are idle.
Each Link on the network shown in Figure 1-1092 has a bandwidth of 100 Mbit/s and the
same metric value. LSRA sends LSRJ traffic at 40 Mbit/s, and LSRG sends LSRJ traffic at 80
Mbit/s. Traffic from both routers travels through the shortest path LSRA (LSRG) → LSRB →
LSRC → LSRD → LSRI → LSRJ that is calculated by an Interior Gateway Protocol (IGP)
protocol. As a result, the path LSRA (LSRG) → LSRB → LSRC → LSRD → LSRI → LSRJ
may be congested because of overload, while the path LSRA (LSRF) → LSRB → LSRE →
LSRF → LSRH → LSRI → LSRJ is idle.
Network congestion is a major cause for backbone network performance deterioration. The
network congestion is resulted from insufficient resources or locally induced by incorrect
resource allocation. For the former, network device expansion can prevent the problem. For
the later, TE is used to allocate some traffic to idle link so that traffic allocation is improved.
TE dynamically monitors network traffic and loads on network elements and adjusts the
parameters for traffic management, routing, and resource constraints in real time, which
prevents network congestion induced by load imbalance.
Conventional TE solutions are as follows:
TE controls network traffic by adjusting the metric of a path. This method eliminates
congestion only on some links. Adjusting a metric is difficult on a complex network
because a link change affects multiple routes.
TE directs some traffic to virtual connections (VCs) based on an overlay model. The
current IGPs are topology driven and applicable to only static network connections,
regardless of dynamic factors, such as bandwidth and traffic attributes.
The overlay model, such as IP over asynchronous transfer mode (ATM), complements
IGP disadvantages. An overlay model provides a virtual topology over a physical
topology for a network. This helps properly adjust traffic and implement QoS features,
but has high costs and poor extensibility.
A scalable and simple solution is required to implement TE on a large-scale network. MPLS,
an overlay model, allows a virtual topology to be established over a physical topology and
maps traffic to the virtual topology. MPLS can be integrated with TE. MPLS TE was
introduced.
Definition
MPLS TE establishes label switched paths (LSPs) based on constraints and conducts traffic to
specific LSPs so that network traffic is transmitted along the specified path. The constraints
include controllable paths and sufficient link bandwidth reserved for services transmitted over
the LSPs. If resources are insufficient, higher-priority LSPs preempt resources, such as
bandwidth, of lower-priority LSPs so that higher-priority services' requirements can be
fulfilled preferentially. In addition, if an LSP fails or a node is congested, MPLS TE protects
network communication using a backup path and the fast reroute (FRR) function. Using
MPLS TE allows a network administrator to deploy LSPs to properly allocate network
resources, which prevents network congestion. If the number of LSPs increases, a specific
offline tool can be used to analyze traffic. MPLS TE can be used on the network shown in
Figure 1-1092 to address congestion. MPLS TE establishes an 80 Mbit/s LSP over the path
LSRG → LSRB → LSRC → LSRD → LSRI → LSRJ and a 40 Mbit/s LSP over the path
LSRA → LSRB → LSRE → LSRF → LSRH → LSRI → LSRJ. MPLS TE directs traffic to
the two LSPs, preventing congestion.
Function Description
Module
Basic Includes basic MPLS TE settings and the tunnel establishment capability.
function
Tunnel Allows existing tunnels to be reestablished over other paths if the topology is
optimizati changed, or these tunnels can be reestablished using updated bandwidth if
on service bandwidth values are changed.
Reliabilit Supports path protection, local protection, and node protection.
y function
Security Supports Resource Reservation Protocol (RSVP) authentication, which
improves signaling security over MPLS TE networks.
P2MP TE Point-to-multipoint (P2MP) traffic engineering (TE) is a promising solution to
multicast service transmission. P2MP TE helps carriers provide high TE
capabilities and increased reliability on an IP/MPLS backbone network and
reduce network operational expenditure (OPEX).
Benefits
MPLS TE offers the following benefits:
Provides sufficient bandwidth and supports QoS capabilities for services.
Optimizes bandwidth allocation.
Establishes public network tunnels to isolate virtual private network (VPN) traffic.
Is implemented based on existing MPLS techniques and its deployment and maintenance
are simple.
Related Concepts
Concept Description
MPLS TE Multiple LSPs are bound together to form an MPLS TE tunnel. An
tunnel MPLS TE tunnel is uniquely identified by the following parameters:
Tunnel interface: a P2P virtual interface that encapsulates packets.
Similar to a loopback interface, a tunnel interface is a logical
interface. A tunnel interface name is identified by an interface type
and number. The interface type is "tunnel." The interface number is
expressed in the format of SlotID/CardID/PortID.
Tunnel ID: a decimal number that identifies an MPLS TE tunnel and
facilitates tunnel planning and management. A tunnel ID must be
specified before an MPLS TE tunnel interface is configured.
Concept Description
(CR-LSPs).
Unlike Label Distribution Protocol (LDP) LSPs that are established using
routing information, CR-LSPs are established based on bandwidth and
path constraints in addition to routing information.
4 1.12.4.2.7 Directs traffic to a CR-LSP and forwards the traffic along the CR-LSP.
Traffic Although a CR-LSP can be established using the preceding three
Forwarding components, the CR-LSP cannot automatically import traffic. The
Component traffic forwarding component can be used to direct traffic to the
CR-LSP.
A network administrator can configure link and tunnel attributes to enable MPLS TE to
automatically establish a CR-LSP. The network administrator can then direct traffic to the
CR-LSP and forward traffic over the CR-LSP.
Related Concepts
Information Advertisement Component involves the following concepts:
Concept Description
Total link Manually set for a physical link.
bandwidth
Maximum Maximum bandwidth that a link can reserve for an MPLS TE tunnel to be
reservable established.
bandwidth The maximum reservable bandwidth must be lower than or equal to the total
link bandwidth. The maximum reservable bandwidth can be manually set.
TE metric A TE metric is used in TE tunnel path calculation, allowing the calculation
process to be independent from IGP route-based path calculation.
The IGP metric is used for MPLS TE tunnels by default.
SRLG A shared risk link group (SRLG) is a set of links which are likely to fail
concurrently when sharing a physical resource (for example, an optical fiber).
Links in an SRLG share the same risk of faults. If one link fails, other links in
the SRLG also fail.
An SRLG enhances CR-LSP reliability on an MPLS TE network enabled
with CR-LSP hot standby or TE FRR. For more information about the SRLG,
see 1.12.4.5.6 SRLG.
Link Link administrative group is also called link color.
administra A link administrative group is a 32-bit vector, with each bit set to a specified
tive group value that is associated with a desired meaning. For example, a link
administrative group attribute can be configured to describe link bandwidth, a
performance parameter (such as the delay time) or a management policy. The
policy can be a traffic type (multicast for example) or a flag indicating that an
MPLS TE tunnel passes over the link. The link administrative group attribute
is used together with affinities to control the paths for tunnels.
Contents to Be Advertised
The network resource information to be advertised includes the following items:
Link status information: interface IP addresses, link types, and link metric values, which
are collected by an Interior Gateway Protocol (IGP)
Bandwidth information, such as maximum link bandwidth and maximum reservable
bandwidth
TE metric: TE link metric, which is the same as the IGP metric by default
Administrative group
SRLG
Advertisement Methods
Either of the following link status protocol extensions can be used to advertise TE
information:
1.10.8.2.10 IS-IS TE
1.10.6.2.5 OSPF TE
Open Shortest Path First (OSPF) TE and Intermediate System to Intermediate System (IS-IS)
TE automatically collect TE information and flood it to MPLS TE nodes.
The Figure 1-1095 shows the proportion of the bandwidth reserved for each MPLS TE
tunnel to the available bandwidth in the TEDB.
Bandwidth flooding is not performed when tunnels 1 to 9 are created. After tunnel 10 is
created, the bandwidth information (10 Mbit/s in total) on tunnels 1 to 10 is flooded. The
available bandwidth is 90 Mbit/s. Similarly, no bandwidth information is flooded after
tunnels 11 to 18 are created. After tunnel 19 is created, bandwidth information on tunnels
11 to 19 is flooded. The process repeats until tunnel 100 is established.
Figure 1-1095 Proportion of the bandwidth reserved for each MPLS TE tunnel to the
available bandwidth in the TEDB
Related Concepts
Path Calculation Component involves the following concepts:
Concept Description
Bandwidth Bandwidth values are planned based on services that are to pass through a
tunnel. The configured bandwidth is reserved on each node through which a
tunnel passes.
Affinity An affinity is a 32-bit vector, configured on the ingress of a tunnel. It must be
attribute used together with a link administrative group attribute.
After a tunnel is configured with an affinity, a device compares the affinity
with the administrative group value during link selection to determine
whether a link with specified attributes is selected or not. The link selection
criteria are as follows:
The result of the IncludeAny affinity OR administrative group value is not
0.
The result of the ExcludeAny affinity OR the administrative group value
is 0.
IncludeAny equals the result of the affinity attribute OR the subnet mask;
ExcludeAny equals IncludeAny OR the subnet mask; the administrative
group value equals the administrative group value OR the subnet mask.
The following rules apply:
If some bits in a mask are 1s, at least one bit in the administrative group is
1 and the corresponding bit in the affinity must be 1. If some bits in the
affinity are 0s, the corresponding bits in the administrative group cannot
be 1.
For example, an affinity is 0x0000FFFF and its mask is 0xFFFFFFFF.
The higher-order 16 bits in the administrative group of available links are
0 and at least one of the lower-order 16 bits is 1. This means the
administrative group attribute ranges from 0x00000001 to 0x0000FFFF.
If some bits in a mask are 0s, the corresponding bits in the administrative
group are not compared with the affinity bits.
For example, an affinity is 0xFFFFFFFF and its mask is 0xFFFF0000. At
least one of the higher-order 16 bits in an administrative group attribute is
1 and the lower-order 16 bits can be 0s and 1s. This means that the
administrative group attribute ranges from 0x00010000 to 0xFFFFFFFF.
NOTE
Understand specific comparison rules before deploying devices of different vendors
because the comparison rules vary with the vendor.
A network administrator can use the link administrative group and affinities
Concept Description
to control the paths over which MPLS TE tunnels are established.
Explicit An explicit path used to establish a CR-LSP. Nodes to be included or
path excluded are specified on this path. Explicit paths are classified into the
following types:
Strict explicit path
A hop is directly connected to its next hop on a strict explicit path. By
specifying a strict explicit path, the most accurate path is provided for a
CR-LSP.
Concept Description
For example, a CR-LSP is set up over a loose explicit path between LSRA
and LSRF on the network shown in Figure 1-1097. LSRA is the ingress,
and LSRF is the egress. "D loose" indicates that the CR-LSP must pass
through LSRD and LSRD and LSRA may not be directly connected. This
means that other LSRs may exist between LSRD and LSRA.
Hop limit Hop limit is a condition for path selection during CR-LSP establishment.
Similar to the administrative group and affinity attributes, a hop limit defines
the number of hops that a CR-LSP allows.
CSPF Fundamentals
CSPF works based on the following parameters:
Tunnel attributes configured on an ingress to establish a CR-LSP
Traffic engineering database (TEDB)
A TEDB can be generated only after Interior Gateway Protocol (IGP) TE is configured. On an IGP
TE-incapable network, CR-LSPs are established based on IGP routes, but not CSPF calculation results.
CSPF attempts to use the OSPF TEDB to establish a path for a CR-LSP by default. If a path is
successfully calculated using OSPF TEDB information, CSPF completes calculation and does not use
the IS-IS TEDB to calculate a path. If path calculation fails, CSPF attempts to use IS-IS TEDB
information to calculate a path.
CSPF can be configured to use the IS-IS TEDB to calculate a CR-LSP path. If path
calculation fails, CSPF uses the OSPF TEDB to calculate a path.
CSPF calculates the shortest path to a destination. If there are several shortest paths with the
same metric, CSPF uses a tie-breaking policy to select one of them. The following
tie-breaking policies for selecting a path are available:
Most-fill: selects a link with the highest proportion of used bandwidth to the maximum
reservable bandwidth, efficiently using bandwidth resources.
Least-fill: selects a link with the lowest proportion of used bandwidth to the maximum
reservable bandwidth, evenly using bandwidth resources among links.
Random: selects links randomly, allowing LSPs to be established evenly over links,
regardless of bandwidth distribution.
When several links have the same proportion of used bandwidth to the maximum reservable
bandwidth (for example, the links do not use the reserved bandwidths or the same bandwidth
is used on every link), the link discovered first is selected, irrespective of whether most-fill or
least-fill is configured.
For example, CSPF removes links marked blue and links each with bandwidth of 50 Mbit/s
based on tunnel constraints and uses other links each with bandwidth of 100 Mbit/s to
calculate a path for an MPLS TE tunnel on the network shown in Figure 1-1098. The
constraints include the destination LSRE, bandwidth of 80 Mbit/s, and a transit node LSRH.
CSPF calculates a path shown in Figure 1-1099 in the same way SPF would calculate it.
RSVP-TE Messages
RSVP-TE messages are as follows:
Path message: used to request downstream nodes to distribute labels. A Path message
records path information on each node through which the message passes. The path
information is used to establish a path state block (PSB) on a node.
Resv message: used to reserve resources at each hop of a path. A Resv message carries
information about resources to be reserved. Each node that receives the Resv message
reserves resources based on reservation information carried in the message. The
reservation information is used to establish a reservation state block (RSB) and to record
information about distributed labels.
PathErr message: sent upstream by an RSVP node if an error occurs during the
processing of a Path message. A PathErr message is forwarded by every transit node and
arrives at the ingress.
ResvErr message: sent downstream by an RSVP node if an error occurs during the
processing of a Resv message. A ResvErr message is forwarded by every transit node
and arrives at the egress.
PathTear message: sent downstream by the ingress to delete information about the local
state created on every node of the path.
ResvTear message: sent upstream by the egress to delete the local reserved resources
assigned to a path. After receiving the ResvTear message, the ingress sends a PathTear
message to the egress.
Figure 1-1100 shows the process of establishing a CR-LSP. The process is as follows:
1. The ingress configured with RSVP-TE creates a PSB and sends a Path message to transit
nodes.
2. After receiving the Path message, the transit node processes and forwards this message,
and creates a PSB.
3. After receiving the Path message, the egress creates a PSB, uses bandwidth reservation
information in the Path message to generate a Resv message, and sends the Resv
message to the ingress.
4. After receiving the Resv message, the transit node processes and forwards the Resv
message and creates an RSB.
5. After receiving the Resv message, the ingress creates an RSB and confirms that the
resources are reserved successfully.
6. The ingress successfully establishes a CR-LSP to the egress.
ingress sends a Tear message downstream to delete soft states maintained on nodes of the
previous path.
Reservation Styles
A reservation style defines how a node reserves resources after receiving a request sent by an
upstream node. The NE20E supports the following reservation styles:
Fixed filter (FF): defines a distinct bandwidth reservation for data packets from a
particular transmit end.
Shared explicit (SE): defines a single reservation for a set of selected transmit ends.
These senders share one reservation but assign different labels to a receive end.
Background
RSVP Refresh messages are used to synchronize path state block (PSB) and reservation state
block (RSB) information between nodes. They can also be used to monitor the reachability
between RSVP neighbors and maintain RSVP neighbor relationships. As the sizes of Path and
Resv messages are larger, sending many messages to establish many CR-LSPs causes
increased consumption of network resources. RSVP Srefresh can be used to address this
problem.
Implementation
RSVP Srefresh defines new objects based on the existing RSVP protocol:
Message_ID extension and retransmission extension
The Srefresh extension builds on the Message_ID extension. According to the
Message_ID extension mechanism defined in relevant standards, RSVP messages carry
extended objects, including Message_ID and Message_ID_ACK objects. The two
objects are used to confirm RSVP messages and support reliable RSVP message
delivery.
The Message_ID object can also be used to provide the RSVP retransmission mechanism.
For example, a node initializes a retransmission interval as Rf seconds after it sends an
RSVP message carrying the Message_ID object. If the node receives no ACK message
within Rf seconds, the node retransmits an RSVP message after (1 + Delta) x Rf seconds.
The Delta determines the increased rate of the transmission interval set by the sender.
The node keeps retransmitting the message until it receives an ACK message or the
retransmission times reach the threshold (called a retransmission increment value).
Summary Refresh extension
The Summary Refresh extension supports Srefresh messages to update the RSVP status,
without the transmission of standard Path or Resv messages.
Each Srefresh message carries a Message_ID object. Each object contains multiple
messages IDs, each of which identifies a Path or Resv state to be refreshed. If a CR-LSP
changes, its message ID value increases.
Only the state that was previously advertised by Path and Resv messages containing
Message_ID objects can be refreshed using the Srefresh extension.
After a node receives an Srefresh message, the node compares the Message_ID with that
saved in a local state block. If they match, the node does not change the state. If the
Message_ID is greater than that saved in the local state block, the node sends a NACK
message to the sender, refreshes the PSB or RSB based on the Path or Resv message, and
updates the Message_ID.
Message_ID objects contain sequence numbers of Message_ID objects. If a CR-LSP
changes, the associated Message_ID sequence number increases. When receiving an
Srefresh message, the node compares the sequence number of the Message_ID with the
sequence number of the Message_ID saved in the PSB. If they are the same, the node
does not change the state; if the received sequence number is greater than the local one,
the state has been updated.
Background
RSVP Refresh messages are used to synchronize path state block (PSB) and reservation state
block (RSB) information between nodes. They can also be used to monitor the reachability
between RSVP neighbors and maintain RSVP neighbor relationships.
Using Path and Resv messages to monitor neighbor reachability delays a traffic switchover if
a link fault occurs and therefore is slow. The RSVP Hello extension can address this problem.
Related Concepts
RSVP Refresh messages: Although an MPLS TE tunnel is established using Path and
Resv messages, RSVP nodes still send Path and Resv messages over the established
tunnel to update the RSVP status. These Path and Resv messages are called RSVP
Refresh messages.
RSVP GR: ensures uninterrupted transmission on the forwarding plane while an
AMB/SMB switchover is performed on the control plane. A GR helper assists a GR
restarter in rapidly restoring the RSVP status.
TE FRR: a local protection mechanism for MPLS TE tunnels. If a fault occurs on a
tunnel, TE FRR rapidly switches traffic to a bypass tunnel.
Implementation
The principles of the RSVP Hello extension are as follows:
1. Hello handshake mechanism
LSRA and LSRB are directly connected on the network shown in Figure 1-1101.
− If RSVP Hello is enabled on LSRA, LSRA sends a Hello Request message to
LSRB.
− After LSRB receives the Hello Request message and is also enabled with RSVP
Hello, LSRB sends a Hello ACK message to LSRA.
− After receiving the Hello ACK message, LSRA considers LSRB reachable.
2. Detecting neighbor loss
After a successful Hello handshake is implemented, LSRA and LSRB exchange Hello
messages. If LSRB does not respond to three consecutive Hello Request messages sent
by LSRA, LSRA considers router B lost and re-initializes the RSVP Hello process.
3. Detecting neighbor restart
If LSRA and LSRB are enabled with RSVP GR, and the Hello extension detects that
LSRB is lost, LSRA waits for LSRB to send a Hello Request message carrying a GR
extension. After receiving the message, LSRA starts the GR process on LSRB and sends
a Hello ACK message to LSRB. After receiving the Hello ACK message, LSRB
performs the GR process and restores the RSVP soft state. LSRA and LSRB exchange
Hello messages to maintain the restored RSVP soft state.
If GR is disabled and FRR is enabled, FRR switches traffic to a bypass CR-LSP after the Hello
extension detects that the RSVP neighbor relationship is lost to ensure proper traffic transmission.
If GR is enabled, the GR process is performed.
Deployment Scenarios
The RSVP Hello extension applies to networks enabled with both RSVP GR and TE FRR.
Static Route
Static route is the simplest method for directing traffic to a CR-LSP in an MPLS TE tunnel. A
TE static route works in the same way as a common static route and has a TE tunnel interface
as an outbound interface.
Auto Route
An Interior Gateway Protocol (IGP) uses an auto route related to a CR-LSP in a TE tunnel
that functions as a logical link to calculate a path. The tunnel interface is used as an outbound
interface in the auto route. The TE tunnel is considered a P2P link with a specified metric
value. The following auto routes are supported:
IGP shortcut: A route related to a CR-LSP is not advertised to neighbor nodes,
preventing other nodes from using the CR-LSP.
Forwarding adjacency: A route related to a CR-LSP is advertised to neighbor nodes,
allowing these nodes to use the CR-LSP.
The forwarding adjacency advertises CR-LSP routes with neighbor IP addresses by
sending link-state advertisements (LSAs) or IS-IS link state packets (LSPs). Type 10
Opaque LSAs carry the neighbor IP addresses in the Remote IP Address
sub-type-length-value (sub-TLV), and LSPs carry the neighbor IP addresses in
intermediate system (IS) reachability TLV's Remote IP Address sub-TLV.
If the forwarding adjacency is used, nodes on both ends of a CR-LSP must be in the
same area.
The following example demonstrates the IGP shortcut and forwarding adjacency.
Figure 1-1102 Schematic diagram for IGP shortcut and forwarding adjacency
A CR-LSP over the path LSRG → LSRF → LSRB is established on the network shown in
Figure 1-1102, and the TE metric values are specified. Either of the following configurations
can be used:
The auto route is not used. LSRE uses LSRD as the next hop in a route to LSRA and a
route to LSRB; LSRG uses LSRF as the next hop in a route to LSRA and a route to
LSRB.
The auto route is used. Either IGP shortcut or forwarding adjacency can be configured:
− The IGP shortcut is used to advertise the route of Tunnel 1. LSRE uses LSRD as the
next hop in the route to LSRA and the route to LSRB; LSRG uses Tunnel 1 as the
next hop in the route to LSRA and the route to LSRB. LSRG, unlike LSRE, uses
Tunnel 1 in IGP path calculation.
− The forwarding adjacency is used to advertise the route of Tunnel 1. LSRE uses
LSRG as the next hop in the route to LSRA and the route to LSRB; LSRG uses
Tunnel 1 as the next hop in the route to LSRA and the route to LSRB. Both LSRE
and LSRG use Tunnel 1 in IGP path calculation.
Policy-based Routing
The policy-based routing (PBR) allows the system to select routes based on user-defined
policies, improving security and load balancing traffic. If PBR is enabled on an MPLS
network, IP packets are forwarded over specific CR-LSPs based on PBR rules.
MPLS TE PBR, the same as IP unicast PBR, is implemented based on a set of matching rules
and behaviors. The rules and behaviors are defined using an apply clause, in which the
outbound interface is a specific tunnel interface. If packets do not match PBR rules, they are
properly forwarded using IP; if they match PBR rules, they are forwarded over specific
CR-LSPs.
Tunnel Policy
Tunnel policies applied to virtual private networks (VPNs) guide VPN traffic to tunnels in
either of the following modes:
Select-seq mode: The system selects tunnels for VPN traffic in the specified tunnel
selection sequence.
Tunnel binding mode: A CR-LSP is bound to a destination address in a tunnel policy.
This policy applies only to CR-LSPs.
If hard preemption is used, since Tunnel 1 has a higher priority than Tunnel 2, LSRF
sends an RSVP message to tear down Tunnel 2. As a result, some traffic on Tunnel 2 is
dropped if Tunnel 2 is transmitting traffic.
If soft preemption is used, LSRF sends LSRC a Resv message. After LSRC receives this
message, LSRC reestablishes Tunnel 2 over another path
LSRC→LSRE→LSRD→LSRB. LSRC switches traffic to the new path before tearing
down Tunnel 2 over the original path.
Background
A tunnel affinity and a link administrative group attribute are 32-bit hexadecimal numbers. An
IGP (IS-IS or OSPF) advertises the administrative group attribute to devices in the same IGP
area. RSVP-TE advertises the tunnel affinity to downstream devices. CSPF on the ingress
checks whether administrative group bits match affinity bits to determine whether a link can
be used to establish a CR-LSP.
Hexadecimal calculations are complex during network deployment, and maintaining and
querying tunnels established using hexadecimal calculations are difficult. Each bit in a
hexadecimal-number affinity can be assigned a name. In this example, colors are used to
name affinity bits. Naming affinity bits help verify that tunnel affinity bits match link
administrative group bits, therefore facilitating network planning and deployment.
Implementation
An affinity name template can be configured to manage the mapping between affinity bits and
names. Configuring the same template on all nodes on an MPLS network is recommended.
Inconsistent configuration may cause a service provision failure. You can name each of 32
affinity bits differently. As shown in Figure 1-1104,, the affinity bits are named using colors.
The second affinity bit is "red", the fourth bit is "blue", and the sixth bit is "brown."
Bits in a link administrative group must also be configured the same names as the affinity bits.
Once affinity bits are named, you can then determine which links a CR-LSP can include or
exclude on the ingress. Rules for selecting links are as follows:
include-any: CSPF includes a link when calculating a path, if at least one link
administrative group bit has the same name as an affinity bit.
exclude: CSPF excludes a link when calculating a path, if any link administrative group
bit has the same name as an affinity bit.
include-all: CSPF includes a link when calculating a path, only if each link
administrative group bit has the same name as each affinity bit.
Usage Scenarios
The affinity naming function is used when CSPF calculates paths over which RSVP-TE
establishes CR-LSPs.
Benefits
The affinity naming function allows you to easily and rapidly use affinity bits to control paths
over which CR-LSPs are established.
Background
MPLS TE tunnels are used to optimize traffic distribution over a network. An MPLS TE
tunnel is configured using static information, such as a bandwidth setting and a calculated
path. Without the optimization function, an MPLS TE tunnel cannot be automatically updated
after the service bandwidth or a tunnel management policy changes. This wastes network
resources. MPLS TE tunnels need to be optimized after being established.
Implementation
A specific event that occurs on the ingress can trigger optimization for a CR-LSP bound to an
MPLS TE tunnel. The optimization enables the CR-LSP to be reestablished over the optimal
path with the smallest metric.
The fixed filter (FF) reservation style and CR-LSP re-optimization cannot be configured together.
Although re-optimization can be successfullyconfigured for a CR-LSP that is established over an
explicit path, the configuration does not take effect.
Background
MPLS TE tunnels are used to optimize traffic distribution over a network. Traffic that
frequently changes wastes MPLS TE tunnel bandwidth; therefore, automatic bandwidth
adjustment is used to prevent this waste. A bandwidth is initially set to meet the requirement
for the maximum volume of services to be transmitted over an MPLS TE tunnel, to ensure
uninterrupted transmission.
Related Concepts
Automatic bandwidth adjustment allows the ingress to dynamically detect bandwidth changes
and periodically attempt to reestablish a tunnel with the needed bandwidth.
Table 1-338 lists concepts and their descriptions.
Implementation
Automatic bandwidth adjustment is enabled on a tunnel interface of the ingress. The
automatic bandwidth adjustment procedure on the ingress is as follows:
1. Samples traffic.
The ingress starts a bandwidth adjustment timer (A) and samples traffic at a specific
interval (B seconds) to obtain the instantaneous bandwidth during each sampling period.
The ingress records the instantaneous bandwidths.
2. Calculates an average bandwidth.
After timer A expires, the ingress uses the records to calculate an average bandwidth (D)
to be used as a target bandwidth.
3. Calculates a path.
The ingress runs CSPF to calculate a path with bandwidth D and establishes a new
CR-LSP over that path.
4. Switches traffic to the new CR-LSP.
The ingress switches traffic to the new CR-LSP before tearing down the original
CR-LSP.
The preceding procedure repeats each time automatic bandwidth adjustment is triggered.
Bandwidth adjustment is not needed if traffic fluctuates below a specific threshold. The
ingress calculates an average bandwidth after the sampling interval time elapses. The ingress
performs automatic bandwidth adjustment if the ratio of the difference between the average
and existing bandwidths to the existing bandwidth exceeds a specific threshold. The following
inequality applies:
[(D - C)/D] x 100% > Threshold
Other Usage
The following functions are supported based on automatic bandwidth adjustment:
The ingress only samples traffic on a tunnel interface, and does not perform bandwidth
adjustment.
The upper and lower limits can be set to define a range, within which the bandwidth can
fluctuate.
1.12.4.3.3 PCE+
The PCE+ solution is used for interconnection between Huawei forwarders and Huawei controllers.
Background
The ingress runs the constrained shortest path first (CSPF) algorithm and uses information
stored in the traffic engineering database (TEDB) to calculate MPLS TE tunnels. On an
inter-domain network, each ingress can only obtain topology information within a single
domain. Therefore, the ingress faces the following challenges when establishing inter-domain
tunnels:
Failure to calculate optimal E2E paths.
Failure to calculate different paths for primary and backup MPLS TE tunnels, so that the
paths for primary and backup MPLS TE tunnels share a node on a domain border.
PCE+ solution can help resolved the preceding issues in MPLS networks. This solution
involves two device roles:
PCE server: usually an SDN controller. A PCE server stores the path information of the
entire network and computes paths based on stored information to optimize
network-wide resource usage.
PCE client: usually an SDN forwarder serving as a tunnel ingress. A PCE client is the
initiator of path computation requests. After receiving the path computation results and
tunnel constraints from a PCE server, a PCE client sets up a TE tunnel as required.
Benefits
The PCE+ solution offers the following benefits:
A PCE calculates optimal E2E paths for MPLS TE tunnels within a PCE domain.
Stateful PCEs can be used to improve the efficiency of bandwidth resource use and
simplify network deployment and maintenance.
The PCE feature uniformly configures and manages TE topology information and tunnel
constraints, which streamlines network operation and maintenance.
Allows for better control of PCE path calculation results.
Related Concepts
PCE server
Defined in relevant standards, a PCE server is an entity that can use network topology
information to calculate paths or constrained routes. A PCE server can be an operations
support system (OSS) application, a network node, or a server. A PCE server on an MPLS TE
network receives a calculation request sent by an ingress and uses TEDB information to
calculate an optimal constrained path for an MPLS TE tunnel.
PCC
A path computation client (PCC) sends a calculation request to a PCE. The ingress of an
MPLS TE tunnel can function as a PCC.
PCEP
The Path Computation Element Communication Protocol (PCEP), defined in relevant
standards, exchanges information between a PCC and a selected PCE and between PCEs in
different domains.
Domain
A domain can be an Interior Gateway Protocol (IGP) area or a Border Gateway Protocol
(BGP) autonomous system (AS). The NE20E supports IGP areas only.
LSP DB
After a PCC advertises LSP attributes of an MPLS network to all PCEs, each PCE stores
these attributes in the label switched path (LSP) databases (DBs).
Stateful PCE
Stateful PCEs technique construct LSP DBs to monitor LSP information, including the
assigned bandwidth and LSP establishment status, and use the LSP DB and TEDB
information to calculate optimal paths for LSPs on an MPLS network.
Implementation
The PCE feature performs discovers PCEs. After members are discovered, PCCs and the PCE
server establish PCEP sessions to exchange information. Before the ingress functioning as a
PCC establishes an MPLS TE tunnel, the ingress sends a request to the selected PCE server to
calculate a path and waits for the calculation result. Unlike IETF PCE, the NE20E allows you
to verify and accept the calculated result or allows the PCE server to automatically confirm
and accept the calculated path. After the calculated path is confirmed, the PCE server replies
with this result to the PCC. Upon receipt the calculation result, the PCC establishes an LSP.
To improve network bandwidth usage efficiency and simplify network operation and
maintenance, the NE20E implements Stateful PCEs and Uniform TE Network Information
Configuration and Management.
PCE Discovery
An available PCC must be discovered before it sends a path calculation request to a PCE
server. The PCE server, however, does not have to proactively discover a PCC. The NE20E
only supports manually configured PCE member relationships. You need to specify the source
IP address on a PCE server. The PCC then establishes a connection to the source IP address of
the PCE server. You can specify multiple candidate PCE servers for the same PCC. The PCC
selects a server based on the priority and source IP address. If candidate PCE servers have the
same priority, the PCC selects a server with the smallest IP address. Other servers function as
backup servers. If the server that is selected to calculate paths fails, the PCC automatically
selects another server.
PCEP Sessions
After a PCC discovers PCEs in different domains, the PCC establishes a PCEP session with a
selected PCE within a specific domain, and the PCEs in different domains establish PCEP
sessions with each other. The devices exchange information, including path calculation results,
over the sessions.
The process of establishing a PCEP session between two PCEs in different domains is similar to the
preceding process of establishing a PCEP session between the PCC and PCE.
Step Description
1 The ingress is configured as a PCC and sends a request to a PCE to establish an
LSP on the network shown in Figure 1-1107.
2 The ingress sends the PCE server a PCEP Report message to calculate a path and
delegate the LSP.
3 Upon receipt, the PCE server obtains the ingress and egress addresses carried in
the message and uses TEDB information to calculate the optimal path between the
ingress and egress. After the PCE server receives the Report message, it saves
LSP information carried in the message to the LSP DB. The PCE server then uses
the TEDB information and the local policy to calculate paths or globally optimize
paths.
4 The PCE server sends an Update message to notify the ingress of the calculation
result.
Step Description
5 The ingress uses RSVP signaling to establish an LSP over the calculated path.
Stateful PCEs
Stateful PCEs help establish optimal paths for both primary and backup TE LSPs. Although
MPLS TE is used to properly assign network resources and improve network bandwidth
usage, the TE LSP establishment mechanism insufficient serve these purposes. In Figure
1-1108, each link has 10 Gbit/s bandwidth. The LSP between nodes A and E needs 6 Gbit/s
bandwidth, the LSP between nodes C and D needs 8 Gbit/s bandwidth, and the LSP between
nodes C and G needs 4 Gbit/s bandwidth. The setup priority of the LSP between nodes A and
E is the highest. The C-to-D link has less than 12 Gbit/s bandwidth and a priority lower than
the A-to-E link. Without stateful PCEs, these three LSPs shown in Figure 1-1108 (a) will be
established. As a result, these established LSPs use all links on the network, which is an
extremely inefficient use of network bandwidth.
Alternatively, stateful PCEs can be used to improve network bandwidth usage. For example,
in Figure 1-1108 (b), stateful PCEs are used to establish the three LSPs over optimal paths.
The bandwidth of the links between A and B, B and C, and D and E remain available.
Stateful PCEs construct LSP DBs to monitor LSP information, including assigned bandwidth
and establishment status. After stateful PCE is enabled on each node, the PCC advertises LSP
attributes to the now stateful PCEs, and the stateful PCEs construct LSP DBs to store LSP
attributes. All nodes on the MPLS network then have LSP DBs that contain consistent
information. The stateful PCEs then use TEDB and LSP DB information to calculate paths for
LSPs. Stateful PCEs work in either of the following modes:
Active stateful PCE: Each PCE automatically updates the LSP status and parameters,
while calculating paths.
Passive stateful PCE: PCEs calculate paths, but do not update the LSP status or
parameters.
Background
MPLS TE provides various TE and reliability functions, and MPLS TE applications increase.
The complexity of MPLS TE tunnel configurations, however, also increases. Manually
configuring full-meshed TE tunnels on a large network is laborious and time-consuming. To
address the issues, the HUAWEI NE20E-S2 implements the IP-prefix tunnel function. This
function uses an IP prefix list to automatically establish a number of tunnels to specified
destination IP addresses and applies a tunnel template that contains public attributes to these
tunnels. MPLS TE tunnels that meet expectations can be established in a batch.
Benefits
The IP-prefix tunnel function allows you to establish MPLS TE tunnels in a batch. This
function satisfies various configuration requirements, such as reliability requirements, and
reduces TE network deployment workload.
Implementation
The IP-prefix tunnel implementation is as follows:
1. Configure an IP prefix list that contains multiple destination IP addresses.
2. Configure a tunnel template to set public attributes.
3. Use the template to automatically establish MPLS TE tunnels to the specified destination
IP addresses.
The IP-prefix tunnel function uses the IP prefix list to filter LSR IDs in the traffic engineering
database (TEDB). Only the LSR IDs that match the IP prefix list can be used as destination IP
addresses of MPLS TE tunnels that are to be automatically established. After LSR IDs in the
TEDB are added or deleted, the IP-prefix tunnel function automatically creates or deletes
tunnels, respectively. The tunnel template that the IP-prefix tunnel function uses contains
various configured attributes, such as the bandwidth, priorities, affinities, TE FRR, CR-LSP
backup, and automatic bandwidth adjustment. The attributes are shared by MPLS TE tunnels
that are established in a batch.
Background
MPLS TE provides a set of tunnel update mechanisms, which prevents traffic loss during
tunnel updates. In real-world situations, an administrator can modify the bandwidth or explicit
path attributes of an established MPLS TE tunnel based on service requirements. An updated
topology allows for a path better than the existing one, over which an MPLS TE tunnel can be
established. Any change in bandwidth or path attributes causes a CR-LSP in an MPLS TE
tunnel to be reestablished using new attributes and causes traffic to switch from the previous
CR-LSP to the newly established CR-LSP. During the traffic switchover, the
make-before-break mechanism prevents traffic loss that occurs if the traffic switchover is
implemented more quickly than the path switchover.
Principles
Make-before-break is a mechanism that allows a CR-LSP to be established using changed
bandwidth and path attributes over a new path before the original CR-LSP is torn down. It
helps minimize data loss and additional bandwidth consumption. The new CR-LSP is called a
modified CR-LSP. Make-before-break is implemented using the shared explicit (SE) resource
reservation style.
The new CR-LSP competes with the original CR-LSP on some shared links for bandwidth.
The new CR-LSP cannot be established if it fails the competition. The make-before-break
mechanism allows the system to reserve bandwidth used by the original CR-LSP for the new
CR-LSP, without calculating the bandwidth to be reserved. Additional bandwidth is used if
links on the new path do not overlap the links on the original path.
In this example, the maximum reservable bandwidth on each link is 60 Mbit/s on the network
shown in Figure 1-1109. A CR-LSP along the path LSRA → LSRB → LSRC → LSRD is
established, with the bandwidth of 40 Mbit/s.
The path is expected to change to LSRA → LSRE → LSRC → LSRD to forward data
because LSRE has a light load. The reservable bandwidth of the link between LSRC and
LSRD is just 20 Mbit/s. The total available bandwidth for the new path is less than 40 Mbit/s.
The make-before-break mechanism can be used in this situation.
The make-before-break mechanism allows the newly established CR-LSP over the path LSRA
→ LSRE → LSRC → LSRD to use the bandwidth of the original CR-LSP's link between
LSRC and LSRD. After the new CR-LSP is established over the path, traffic switches to the
new CR-LSP, and the original CR-LSP is torn down.
In addition to the preceding method, another method of increasing the tunnel bandwidth can
be used. If the reservable bandwidth of a shared link increases to a certain extent, a new
CR-LSP can be established.
In the example shown in Figure 1-1109, the maximum reservable bandwidth on each link is
60 Mbit/s. A CR-LSP along the path LSRA → LSRB → LSRC → LSRD is established, with
the bandwidth of 30 Mbit/s.
The path is expected to change to LSRA → LSRE → LSRC → LSRD to forward data
because LSRE has a light load, and the bandwidth is expected to increase to 40 Mbit/s. The
reservable bandwidth of the link between LSRC and LSRD is just 30 Mbit/s. The total
available bandwidth for the new path is less than 40 Mbit/s. The make-before-break
mechanism can be used in this situation.
The make-before-break mechanism allows the newly established CR-LSP over the path LSRA
→ LSRE → LSRC → LSRD to use the bandwidth of the original CR-LSP's link between
LSRC and LSRD. The bandwidth of the new CR-LSP is 40 Mbit/s, out of which 30 Mbit/s is
released by the link between LSRC and LSRD. After the new CR-LSP is established, traffic
switches to the new CR-LSP and the original CR-LSP is torn down.
1.12.4.5.2 TE FRR
Traffic engineering (TE) fast reroute (FRR) protects links and nodes on MPLS TE tunnels. If
a link or node fails, TE FRR rapidly switches traffic to a backup path, minimizing traffic loss.
Background
A link or node failure in an MPLS TE tunnel triggers a primary/backup CR-LSP switchover.
During the switchover, IGP routes converge to a backup CR-LSP, and CSPF recalculates a
path over which the primary CR-LSP is reestablished. Traffic is dropped during this process.
TE FRR can be used to minimize traffic loss. TE FRR establishes a backup path that excludes
faulty links or nodes. The backup path can rapidly take over traffic, minimizing traffic loss. In
addition, the ingress attempts to reestablish the primary CR-LSP.
Benefits
TE FRR provides carrier-class local protection capabilities for MPLS TE CR-LSPs to
improve network reliability.
Related Concepts
TE FRR works in either facility or one-to-one backup mode.
Facility backup
Figure 1-1110 illustrates facility backup networking.
TE FRR working in facility backup mode establishes a bypass tunnel for each link or
node that may fail on a primary tunnel. A bypass tunnel can protect traffic on multiple
primary tunnels. TE FRR in facility backup mode is configured to establish a single
bypass tunnel to protect primary tunnels. This mode is extensible, resource efficient, and
easy to implement. Bypass tunnels must be manually planned and configured, which is
time-consuming and laborious on a complex network.
One-to-one backup
Figure 1-1111 illustrates one-to-one backup networking.
Table 1-341 Nodes and paths that support facility and one-to-one backup
Table 1-342 describes TE FRR protection functions implemented in facility and one-to-one
backup modes.
Table 1-342 TE FRR protection functions implemented in facility and one-to-one backup modes
Classif Protection Facility Backup One-to-One Backup
ied By Function
Protect Node A PLR and an MP are indirectly connected. A bypass CR-LSP
ed protection protects a direct link to the PLR and nodes on the primary
object CR-LSP's path between the PLR and MP. Both the bypass
CR-LSP in Figure 1-1110 and the detour LSP 1 in Figure 1-1111
provide node protection.
Link A PLR and an MP are directly connected. A bypass CR-LSP only
protection protects the direct link to the PLR. Detour LSP 2 in Figure 1-1111
provides link protection.
Bandwi Bandwidth A bypass CR-LSP can only By default, a detour LSP has
dth to protection provide bandwidth protection the same bandwidth as
be for a primary CR-LSP when protected primary CR-LSP and
reserve the bypass CR-LSP has provides bandwidth protection.
d bandwidth higher than or equal
to the primary CR-LSP.
A bypass CR-LSP working in facility backup mode supports a combination of protection types. For
example, a bypass CR-LSP can implement manual, node, and bandwidth protection.
Implementation
Facility backup implementation
The process of implementing TE FRR in facility backup mode is as follows:
1. The ingress establishes a primary CR-LSP.
The process of searching for a suitable bypass CR-LSP is also called bypass CR-LSP
binding. The primary CR-LSP only with the "local protection desired" flag can trigger a
binding process. The binding must be complete before a primary/bypass CR-LSP
switchover is performed. During the binding, the PLR must obtain information about the
outbound interface of the bypass CR-LSP, next hop label forwarding entry (NHLFE),
label switching router (LSR) ID of the MP, label allocated by the MP, and protection
type.
The PLR already obtains the next hop (NHOP) and next NHOP (NNHOP) of the primary
CR-LSP. The PLR establishes a bypass CR-LSP to provide a specific type of protection
based on the NHOP and NNHOP LSR IDs:
− Link protection can be provided if the egress LSR ID of the bypass CR-LSP is the
same as the NHOP LSR ID.
− Node protection can be provided if the egress LSR ID of the bypass CR-LSP is the
same as the NNHOP LSR ID.
For example, in Figure 1-1113, bypass CR-LSP 1 protects a link, and bypass CR-LSP 2
protects a node.
If multiple bypass CR-LSPs are established, the PLR selects one with the highest priority.
Protection types are prioritized in descending order: bandwidth protection, non-
bandwidth protection, node protection, link protection, manual protection, and automatic
protection. Both bypass CR-LSPs 1 and 2 shown in Figure 1-1113 are manually
configured to provide bandwidth protection. Bypass CR-LSP 1 that protects a link has a
lower priority than bypass CR-LSP 2 that protects a node. In this situation, only bypass
CR-LSP 2 can be bound to a primary CR-LSP. If bypass CR-LSP 1 protects bandwidth
and bypass CR-LSP 2 does not, only bypass CR-LSP 1 can be bound to the primary
CR-LSP.
After a bypass CR-LSP is successfully bound to the primary CR-LSP, the NHLFE of the
primary CR-LSP is recorded. The NHLFE contains the NHLFE index of the bypass
CR-LSP and the inner label assigned by the MP. The inner label is used to forward traffic
during FRR switching.
3. The PLR detects faults.
− In link protection, a data link layer protocol is used to detect and advertise faults.
The speed of fault detection at the data link layer depends on link types.
− In node protection, a data link layer protocol is used to detect link faults. If no link
fault occurs, RSVP Hello detection or bidirectional forwarding detection (BFD) for
RSVP is used to detect faults in protected nodes.
If a link or node fault is detected, FRR switching is triggered immediately.
If node protection is enabled, only the link between the protected node and PLR is protected. The PLR
cannot detect faults in the link between the protected node and MP.
In Figure 1-1114, the bypass CR-LSP provides node protection. If the link between
LSRB and LSRC fails or LSRC fails, LSRB (PLR) swaps an inner label 1024 for an
inner label 1022, pushes an outer label 34 into a packet, and forwards the packet along
the bypass CR-LSP. After the packet arrives at LSRD, LSRD forwards the packet to
LSRE at the next hop. Figure 1-1115 illustrates the forwarding process after TE FRR
switching is complete.
Except the egress, each node on the primary CR-LSP attempts to establish a detour LSP
to protect a downstream link or node. Only qualified nodes can function as PLRs and
establish detour LSPs over paths calculated using CSPF.
Each PLR obtains NHOP information. A PLR establishes a detour LSP to provide a
specific type of protection:
− Link protection is provided if the MP LSR ID on a detour LSP is the same as the
NHOP LSR ID. Detour LSP 2 in Figure 1-1116 provides link protection.
− Node protection is provided if the MP LSR ID on a detour LSP differs from the
NHOP LSR ID when other nodes exist between the PLR and MP. Detour LSP 1 in
Figure 1-1116 provides node protection.
If a PLR can establish detour LSPs that provide both link and node protection, the PLR
only establishes a detour LSP that supports node protection.
3. A PLR detects faults.
− In link protection, a data link layer protocol is used to detect and advertise faults.
The speed of fault detection at the data link layer depends on link types.
− In node protection, a data link layer protocol is used to detect link faults. If no link
fault occurs, RSVP Hello detection or BFD is used to detect faults in a protected
node.
If a link or node fault is detected, FRR switching is triggered immediately.
If node protection is enabled, only the link between the protected node and PLR is protected. The PLR
cannot detect faults in the link between the protected node and MP.
detour LSP (named detour LSP 1, for example). LSRE swaps label 36 for label 37 and
sends the packet to LSRC. Detour LSP 1 overlaps the primary CR-LSP since LSRC.
Therefore, LSRC uses a label for the primary CR-LSP and sends the packet to the egress
LSRD.
5. The ingress on the primary CR-LSP performs a traffic switchback.
After performing a traffic switchover, the ingress on the primary CR-LSP attempts to
reestablish a modified CR-LSP using the make-before-break mechanism. The ingress
then switches service traffic and RSVP messages to the established modified CR-LSP
and tears down the original primary CR-LSP.
Other Usage
When the TE FRR is in the FRR-in-use state, the interface sends RSVP messages without
interface authentication TLV to a remote interface. Upon receipt of this message, the remote
interface does not perform interface authentication in this situation. To enable authentication,
the neighbor authentication mode can be configured.
TE FRR can be used to implement board removal protection. Board removal protection
enables a PLR to retain information about the primary CR-LSP's outbound interface that
resides on an interface board of the PLR. If the interface board is removed, the PLR rapidly
switches MPLS TE traffic to a bypass CR-LSP or a detour LSP. After the interface board is
re-installed, the PLR switches MPLS TE traffic back to the primary CR-LSP through the
outbound interface. Board removal protection protects traffic on the primary CR-LSP's
outbound interface of the PLR.
Without board removal protection, after an interface board on which a tunnel interface resides
is removed from the PLR, CR-LSP information is lost on the PLR. To prevent CR-LSP
information loss, ensure that the interface board to be removed does not have the following
interfaces: primary CR-LSP's tunnel interface, bypass CR-LSP's tunnel interface, bypass
CR-LSP's outbound interface, or detour LSP's outbound interface.
Configuring a TE tunnel interface on the PLR's IPU is recommended. If the interface board on
which the primary CR-LSP's physical outbound interface resides is removed or fails, the PLR
sets the outbound interface to the Stale state. The PLR's main control board retains
information about each FRR-enabled primary CR-LSP that passes through the outbound
interface. After the interface board is re-installed, the outbound interface becomes available
again. Each primary CR-LSP is then automatically reestablished.
Ordinary backup: A backup CR-LSP is set up after a primary CR-LSP fails. If the
primary CR-LSP fails, a backup CR-LSP is set up and takes over traffic from the primary
CR-LSP. If the primary CR-LSP recovers, traffic switches back to the primary CR-LSP.
Table 1-343 lists differences between hot-standby and ordinary CR-LSPs.
Best-effort path
The hot standby function supports the establishment of best-effort paths. If both the
primary and hot-standby CR-LSPs fail, a best-effort path is established and takes over
traffic.
As shown in Figure 1-1117, the primary CR-LSP uses the path PE1 -> P1 -> PE2, and
the backup CR-LSP uses the path PE1 -> P2 -> PE2. If both the primary and backup
CR-LSPs fail, PE1 triggers the setup of a best-effort path PE1 -> P2 -> P1 -> PE2.
A best-effort path does not provide reserved bandwidth for traffic. The affinity attribute and hop limit are
configured as needed.
Path Overlapping
The path overlapping function can be configured for hot-standby CR-LSPs. This function
allows a hot-standby CR-LSP to use links of a primary CR-LSP. The hot-standby CR-LSP
protects traffic on the primary CR-LSP.
Background
Most live IP radio access networks (RANs) use ring topologies and have the access ring
separated from the aggregation ring. To improve the end-to-end and inter-ring LSP reliability,
many IP RAN carriers require isolated primary and hot-standby LSPs. The CSPF algorithm
does not meet this reliability requirement, because CSPF is a metric-based path computing
algorithm that may compute two intersecting LSPs. Specifying explicit paths can meet this
reliability requirement; this method, however, does not adapt to topology changes. Each time
a node is added to or deleted from the IP RAN, operators must configure new explicit paths,
which is time-consuming and laborious. To resolve these problems, you can configure isolated
LSP computation.
Figure 1-1118 illustrates an IP RAN on which an MPLS TE tunnel is established between a
cell site gateway (CSG) on the access ring and a radio service gateway (RSG) on the
aggregation ring. The MPLS TE tunnel implements the end-to-end virtual private network
(VPN) service. To improve the network reliability, this network requires the constraint-based
routed label switched path (CR-LSP) hot standby feature and isolated primary and
hot-standby LSPs.
Without the isolated LSP computation feature, CSPF on this network will compute CSG ->
ASG1 -> ASG2 -> RSG as the primary LSP. This LSP does not have an isolated hot-standby
LSP. However, two isolated LSPs exist on this network: CSG -> ASG1 -> RSG and CSG ->
ASG2 -> RSG. With the isolated LSP computation feature, the disjoint and CSPF algorithms
work simultaneously to get the two isolated LSPs.
Figure 1-1118 Application of isolated LSP computation on an end-to-end VPN bearer network
Implementation
Isolated LSP computation is implemented by both the disjoint and CSPF algorithms. This
feature computes primary and hot-standby LSPs simultaneously and cuts off overlapping
paths of the two LSPs to get two isolated LSPs. In the example shown in Figure 1-1119,
before isolated LSP computation is configured, CSPF computes LSRA -> LSRB -> LSRC ->
LSRD as the primary LSP and LSRA -> LSRC -> LSRD as the hot-standby LSP if path
overlapping is allowed. These two LSPs intersect, so that they do not meet the reliability
requirement.
After isolated LSP computation is configured, the disjoint and CSPF algorithms compute
LSRA -> LSRB -> LSRD as the primary LSP and LSRA -> LSRC -> LSRD as the
hot-standby LSP. These two LSPs do not intersect, so that they meet the reliability
requirement.
Isolated LSP computation is a best-effort technique. If the disjoint and CSPF algorithms cannot get
isolated primary and hot-standby LSPs or two isolated LSPs do not exist, the device uses the
primary and hot-standby LSPs computed by CSPF.
The disjoint algorithm cannot work together with the following features: explicit path, affinity
property, and hop limit. Therefore, before you configure isolated LSP computation, check that all
those features are disabled. Otherwise, the device does not allow you to configure isolated LSP
computation. After you configure isolated LSP computation, the device does not allow you to
configure any of those features, either.
After you configure isolated LSP computation, the shared risk link group (SRLG), if configured,
becomes ineffective.
Usage Scenario
Isolated LSP computation applies to networks on which Resource Reservation Protocol -
Traffic Engineering (RSVP-TE) tunnels and the hot standby feature are configured.
Benefits
Isolated LSP computation offers the following benefits to carriers:
Improves the network reliability.
Reduces the maintenance workload.
Background
If a device is unable to store new link state protocol data units (LSPs) or use LSPs to update
its link state database (LSDB) information, the device will calculate incorrect routes, causing
forwarding failures. The IS-IS overload function enables the device to set the device to the
IS-IS overload state to prevent such forwarding failures. By configuring the ingress to
establish a CR-LSP that excludes the overloaded IS-IS device, the association between
CR-LSP establishment and the IS-IS overload function helps the CR-LSP reliably transmit
MPLS TE traffic.
Related Concepts
IS-IS overload state
When a device cannot store new LSPs or use LSPs to update its LSDB information using
LSPs, the device will incorrectly calculate IS-IS routes. In this situation, the device will enter
the overload state. For example, an IS-IS device becomes overloaded if its memory resources
decrease to a specified threshold or if an exception occurs on the device. A device can be
manually configured to enter the IS-IS overload state.
Implementation
In Figure 1-1120, RT1 supports the association between CR-LSP establishment and the IS-IS
overload function. RT3 and RT4 support the IS-IS overload function.
In Figure 1-1120, devices RT1 to RT4 are in an IS-IS area. RT1 establishes a CR-LSP named
Tunnel1 destined for RT2 along the path RT1 -> RT3 -> RT2. Association between the
CR-LSP establishment and IS-IS overload is implemented as follows:
1. If RT3 enters the IS-IS overload state, IS-IS propagates packets carrying overload
information in the IS-IS area.
2. RT1 determines that RT3 is overloaded and re-calculates the CR-LSP destined for RT2.
3. RT1 calculates a new path RT1 -> RT4 - >RT2, which bypasses the overloaded IS-IS
node. Then RT1 establishes a new CR-LSP along this path.
4. After the new CR-LSP is established, RT1 switches traffic from the original CR-LSP to
the new CR-LSP, ensuring service transmission quality.
1.12.4.5.6 SRLG
The shared risk link group (SRLG) functions as a constraint that is used to calculate a backup
path in the scenario where CR-LSP hot standby or TE FRR is used. This constraint helps
prevent backup and primary paths from overlapping over links with the same risk level,
improving MPLS TE tunnel reliability as a consequence.
Background
Carriers use CR-LSP hot standby or TE FRR to improve MPLS TE tunnel reliability.
However, in real-world situations protection failures can occur, requiring the SRLG technique
to be configured as a preventative measure, as the following example demonstrates.
The primary tunnel is established over the path PE1 → P1 → P2 → PE2 on the network
shown in Figure 1-1121. The link between P1 and P2 is protected by a TE FRR bypass tunnel
established over the path P1 → P3 → P2.
In the lower part of Figure 1-1121, core nodes P1, P2, and P3 are connected using a transport
network device. They share some transport network links marked in yellow. If a fault occurs
on a shared link, both the primary and FRR bypass tunnels are affected, causing an FRR
protection failure. An SRLG can be configured to prevent the FRR bypass tunnel from sharing
a link with the primary tunnel, ensuring that FRR properly protects the primary tunnel.
Related Concepts
An SRLG is a set of links at the same risk of faults. If a link in an SRLG fails, other links also
fail. If a link in this group is used by a hot-standby CR-LSP or FRR bypass tunnel, the
hot-standby CR-LSP or FRR bypass tunnel cannot provide protection.
Implementation
An SRLG link attribute is a number and links with the same SRLG number are in a single
SRLG.
Interior Gateway Protocol (IGP) TE advertises SRLG information to all nodes in a single
MPLS TE domain. The constraint shortest path first (CSPF) algorithm uses the SRLG
attribute together with other constrains, such as bandwidth, to calculate a path.
The MPLS TE SRLG works in either of the following modes:
Strict mode: The SRLG attribute is a necessary constraint used by CSPF to calculate a
path for a hot-standby CR-LSP or an FRR bypass tunnel.
Preferred mode: The SRLG attribute is an optional constraint used by CSPF to calculate
a path for a hot-standby CR-LSP or FRR bypass tunnel. For example, if CSPF fails to
calculate a path for a hot-standby CR-LSP based on the SRLG attribute, CSPF
recalculates the path, regardless of the SRLG attribute.
Usage Scenario
The SRLG attribute is used in either the TE FRR or CR-LSP hot-standby scenario.
Benefits
The SRLG attribute limits the selection of a path for a hot-standby CR-LSP or an FRR bypass
tunnel, which prevents the primary and bypass tunnels from sharing links with the same risk
level.
Related Concepts
Concepts related to a tunnel protection group are as follows:
Working tunnel: a tunnel to be protected.
Protection tunnel: a tunnel that protects a working tunnel.
Protection switchover: switches traffic from a faulty working tunnel to a protection
tunnel in a tunnel protection group, which improves network reliability.
Figure 1-1122 illustrates a tunnel protection group.
Primary tunnels tunnel-1 and the protection tunnel tunnel-2 are established on the ingress
LSRA on the network shown in Figure 1-1122.
Tunnel-2 is configured as a protection tunnel for primary tunnel tunnel-1 on LSRA. If the
configured fault detection mechanism on the ingress detects a fault in tunnel-1, traffic
switches to tunnel-2. LSRA attempts to reestablish tunnel-1. If tunnel-1 is successfully
established, traffic switches back to the primary tunnel.
Implementation
An MPLS TE tunnel protection group uses a configured protection tunnel to protect traffic on
the working tunnel to improve tunnel reliability. To ensure the improved performance of the
protection tunnel, the protection tunnel must exclude links and nodes through which the
working tunnel passes during network planning.
Table 1-344 describes the implementation procedure of a tunnel protection group.
1 Establish The working and protection tunnels must have the same ingress and
ment destination address. The protection tunnel is established in the same
procedure as a regular tunnel. The protection tunnel can use attributes
that differ from those for the working tunnel. Ensure that the working
and protection tunnels are established over different paths as much as
possible.
NOTE
A protection tunnel cannot be protected or enabled with TE FRR.
2 Binding The protection tunnel is bound to the tunnel ID of the working tunnel
between so that the two tunnels form a tunnel protection group.
the
working
and
protectio
n tunnels
3 Fault MPLS OAM/MPLS-TP OAM is used to detect faults in a tunnel
detection protection group to speed up protection switching.
4 Protectio The tunnel protection group supports either of the following protection
n switching modes:
switchin Manual switching: Traffic is forcibly switched to the protection
g tunnel.
Automatic switching: Traffic automatically switches to the
protection tunnel if the working tunnel fails.
A time interval can be set for automatic switching.
An MPLS TE tunnel protection group only supports bidirectional
switching. If a traffic switchover is performed for traffic in one
direction, a traffic switchover is also performed for traffic in the
opposite direction.
5 Switchba After a traffic switchover is implemented, the ingress attempts to
ck reestablish the working tunnel. If the working tunnel is reestablished,
the ingress can switch traffic back to the working tunnel or still
forward traffic over the protection tunnel.
Table 1-345 Comparison between CR-LSP backup and a tunnel protection group
On the network shown in Figure 1-1123, BFD is disabled. If LSRE fails, LSRA or LSRF
cannot promptly detect the fault because a Layer 2 switch exists between them. Although the
Hello mechanism detects the fault, detection lasts for a long time.
If LSRE fails, LSRA and LSRF detect the fault rapidly, and traffic switches to the path LSRA
-> LSRB -> LSRD -> LSRF.
BFD for TE detects faults in a CR-LSP. After detecting a fault in a CR-LSP, BFD for TE
immediately notifies the forwarding plane of the fault to rapidly trigger a traffic switchover.
BFD for TE is usually used together with a hot-standby CR-LSP.
The concepts associated with BFD are as follows:
Static BFD session: established by manually setting the local and remote discriminators.
The local discriminator on a local node must match the remote discriminator on a remote
node. The minimum intervals at which BFD packets are sent and received are
changeable after a static BFD session is established.
Dynamic BFD session: established without a local or remote discriminator specified.
After a routing protocol neighbor is established between the local and remote nodes, the
RM delivers parameters to instruct the BFD module to establish a BFD session. The two
nodes negotiate the local discriminator, remote discriminator, minimum interval at which
BFD packets are sent, and minimum interval at which BFD packets are received.
Detection period: an interval at which the system checks the BFD session status. If no
packet is received from the remote end within a detection period, the BFD session is
considered Down.
A BFD session is bound to a CR-LSP. A BFD session is set up between the ingress and egress.
A BFD packet is sent by the ingress to the egress along a CR-LSP. Upon receipt, the egress
responds to the BFD packet. The ingress can rapidly monitor the status of links through which
the CR-LSP passes based on whether a reply packet is received.
If a link fault is detected, BFD notifies the forwarding module of the fault. The forwarding
module searches for a backup CR-LSP and switches traffic to the backup CR-LSP. In addition,
the forwarding module reports the fault to the control plane. If dynamic BFD for TE CR-LSP
is used, the control plane proactively creates a BFD session to detect faults in the backup
CR-LSP. If static BFD for TE CR-LSP is used, a BFD session is created manually to detect
faults in the backup CR-LSP if necessary.
On the network shown in Figure 1-1124, a BFD session is set up to detect faults in the link
through which the primary CR-LSP passes. If a link fault occurs, the BFD session on the
ingress immediately notifies the forwarding plane of the fault. The ingress switches traffic to
the bypass CR-LSP and sets up a new BFD session to detect faults in the bypass CR-LSP.
On the network shown in Figure 1-1125, a primary CR-LSP is established along the path
LSRA -> LSRB, and a hot-standby CR-LSP is configured. A BFD session is set up between
LSRA and LSRB to detect faults in the primary CR-LSP. If a fault occurs on the primary
CR-LSP, the BFD session rapidly notifies LSRA of the fault. After receiving the fault
information, LSRA rapidly switches traffic to the hot-standby CR-LSP to ensure traffic
continuity.
Benefits
No tunnel protection is provided in the NG-MVPN over P2MP TE function or VPLS over
P2MP TE function. If a tunnel fails, traffic can only be switched using route change-induced
hard convergence, which renders low performance. This function provides dual-root 1+1
protection for the NG-MVPN over P2MP TE function and VPLS over P2MP TE function. If a
P2MP TE tunnel fails, BFD for P2MP TE rapidly detects the fault and switches traffic, which
improves fault convergence performance and reduces traffic loss.
Principles
In Figure 1-1126, BFD is enabled on the root PE1 and the backup root PE2. Leaf nodes UPE1
to UEP4 are enabled to passively create BFD sessions. Both PE1 and PE2 sends BFD packets
to all leaf nodes along P2MP TE tunnels. The leaf nodes receives the BFD packets transmitted
only on the primary tunnel. If a leaf node receives detection packets within a specified
interval, the link between the root node and leaf node is working properly. If a leaf node fails
to receive BFD packets within a specified interval, the link between the root node and leaf
node fails. The leaf node then rapidly switches traffic to a protection tunnel, which reduces
traffic loss.
Background
When a Layer 2 device is deployed on a link between two RSVP nodes, an RSVP node can
only use the Hello mechanism to detect a link fault. For example, on the network shown in
Figure 1-1127, a switch exists between P1 and P2. If a fault occurs on the link between the
switch and P2, P1 keeps sending Hello packets and detects the fault after it fails to receive
replies to the Hello packets. The fault detection latency causes seconds of traffic loss. To
minimize packet loss, BFD for RSVP can be configured. BFD rapidly detects a fault and
triggers TE FRR switching, which improves network reliability.
Implementation
BFD for RSVP monitors RSVP neighbor relationships.
Unlike BFD for CR-LSP and BFD for TE that support multi-hop BFD sessions, BFD for
RSVP establishes only single-hop BFD sessions between RSVP nodes to monitor the network
layer.
BFD for RSVP, BFD for OSPF, BFD for IS-IS, and BFD for BGP can share a BFD session.
When protocol-specific BFD parameters are set for a BFD session shared by RSVP and other
protocols, the smallest values take effect. The parameters include the minimum intervals at
which BFD packets are sent, minimum intervals at which BFD packets are received, and local
detection multipliers.
Usage Scenario
BFD for RSVP applies to a network on which a Layer 2 device exists between the TE FRR
point of local repair (PLR) on a bypass CR-LSP and an RSVP node on the primary CR-LSP.
Benefits
BFD for RSVP improves reliability on MPLS TE networks with Layer 2 devices.
1.12.4.5.12 RSVP GR
RSVP graceful restart (GR) is a status recovery mechanism supported by RSVP-TE.
RSVP GR is designed based on non-stop forwarding (NSF). If a fault occurs on the control
plane of a node, the upstream and downstream neighbor nodes send messages to restore
RSVP soft states, but the forwarding plane does not detect the fault and is not affected. This
helps stably and reliably transmit traffic.
RSVP GR uses the Hello extension to detect the neighboring nodes' GR status. For more
information about the Hello feature, see .
RSVP GR principles are as follows:
On the network shown in Figure 1-1128, if the restarter performs GR, it stops sending Hello
messages to its neighbors. If the GR-enabled helpers fail to receive three consecutive Hello
messages, the helpers consider that the restarter is performing GR and retain all forwarding
information. In addition, the interface board continues transmitting services and waits for the
restarter to restore the GR status.
After the restarter restarts, if it receives Hello Path messages from helpers, it replies with
Hello ACK messages. The types of the Hello messages returned by the upstream and
downstream nodes on a tunnel are different:
If an upstream helper receives a Hello message, it sends a GR Path message downstream
to the restarter.
If a downstream helper receives a Hello message, it sends a Recovery Path message
upstream to the restarter.
Figure 1-1128 Networking diagram for restoring the GR status by sending GR Path and Recovery
Path messages
If both the GR Path and Recovery Path messages are received, the restarter creates the new
PSB associated with the CR-LSP. This restores information about the CR-LSP on the control
plane.
If no Recovery Path message is sent and only a GR Path message is received, the restarter
creates the new PSB associated with the CR-LSP based on the GR Path message. This
restores information about the CR-LSP on the control plane.
The NE20E can only function as a GR Helper to help a neighbor node to complete RSVP GR.
Background
On a network with a static bidirectional co-routed CR-LSP used to transmit services, if a few
packets are dropped or bit errors occur on links, no alarms indicating link or LSP failures are
generated, which poses difficulties in locating the faults. To locate the faults, loopback
detection can be enabled for the static bidirectional co-routed CR-LSP.
Implementation
To implement loopback detection for a specified static bidirectional co-routed CR-LSP, a
transit node temporarily connects the forward CR-LSP to the reverse CR-LSP and generates a
forwarding entry for the loop so that the transit node can loop all traffic back to the ingress. A
professional monitoring device connected to the ingress monitors data packets that the ingress
sends and receives and checks whether a fault occurs on the link between the ingress and
transit node.
The dichotomy method is used to perform loopback detection by reducing the range of nodes
to be monitored before locating a faulty node. For example, in Figure 1-1129, loopback
detection is enabled for a static bidirectional co-routed CR-LSP established between PE1
(ingress) and PE2 (egress). The process of using loopback detection to locate a fault is as
follows:
1. Loopback is enabled on P1 to loop data packets back to the ingress. The ingress checks
whether the sent packets match the received ones.
− If the packets do not match, a fault occurs on the link between PE1 and P1.
Loopback detection can then be disabled on P1.
− If the packets match, the link between PE1 and P1 is working properly. The fault
location continues.
2. Loopback is disabled on P1 and enabled on P2 to loop data packets back to the ingress.
The ingress checks whether the sent packets match the received ones.
− If the packets do not match, a fault occurs on the link between P1 and P2. Loopback
detection can then be disabled on P2.
− If the packets match, a fault occurs on the link between P2 and PE2. Loopback
detection can then be disabled on P2.
Loopback detection information is not saved in a configuration file after loopback detection is enabled.
A loopback detection-enabled node loops traffic back to the ingress through a temporary loop. Loopback
alarms can then be generated to prompt users that loopback detection is performed. After loopback
detection finishes, it can be manually or automatically disabled. Loopback detection configuration takes
effect only on a main control board. After a master/slave main control board switchover is performed,
loopback detection is automatically disabled.
Benefits
Loopback detection for a static bidirectional co-routed CR-LSP helps rapidly local faults,
such as minor packet loss or bit errors, and improve network operation and maintenance
efficiency.
Principles
RSVP messages are sent over Raw IP with no security mechanism and expose themselves to
being modified and expose devices to attacks. These packets are easy to modify, and a device
receiving these packets is exposed to attacks.
RSVP authentication prevents the following situations and improves device security:
An unauthorized remote router sets up an RSVP neighbor relationship with the local
router.
A remote router constructs forged RSVP messages to set up an RSVP neighbor
relationship with the local router and initiates attacks (such as maliciously reserving a
large number of bandwidths) to the local router.
RSVP authentication parameters are as follows:
Key
The same key must be configured on two RSVP nodes before they perform RSVP
authentication. A node uses this key to compute a digest for a packet to be sent based on
the HMAC (Keyed-Hashing for Message Authentication)-Message Digest 5 (MD5)
algorithm or Secure Hash Algorithm (SHA). The packet carrying the digest as an
integrity object is sent to a remote node. After receiving the packet, the remote node uses
the same key and algorithm to compute a digest for the packet, and compares the
computed digest with the one carried in the packet. If they are the same, the packet is
accepted; if they are different, the packet is discarded.
Sequence number
In addition, each packet is assigned a 64-bit monotonically increasing sequence number
before being sent, which prevents replay attacks. After receiving the packet, the remote
node checks whether or not the sequence number is in an allowable window. If the
sequence number in the packet is smaller than the lower limit defined in the window, the
receiver considers the packet as a replay packet and discards it.
RSVP authentication also introduces handshake messages. If a receiver receives the first
packet from a transmit end or packet mis-sequence occurs, handshake messages are used
to synchronize the sequence number windows between the RSVP neighboring nodes.
Authentication lifetime
Network flapping causes an RSVP neighbor relationship to be deleted and created
alternatively. Each time the RSVP neighbor relationship is created, the handshake
process is performed, which delays the establishment of a CR-LSP. The RSVP
authentication lifetime is introduced to resolve the problem. If a network flaps, a
CR-LSP is deleted and created. During the deletion, the RSVP neighbor relationship
associated with the CR-LSP is retained until the RSVP authentication lifetime expires.
Background
Service packets exchanged by two nodes must travel through the same links and nodes on a
transport network without running a routing protocol. Co-routed bidirectional static CR-LSPs
can be used to meet the requirements.
Definition
A co-routed bidirectional static CR-LSP is a type of CR-LSP over which two flows are
transmitted in opposite directions over the same links. A co-routed bidirectional static
CR-LSP is established manually.
A co-routed bidirectional static CR-LSP differs from two LSPs that transmit traffic in opposite
directions. Two unidirectional CR-LSPs bound to a co-routed bidirectional static CR-LSP
function as a single CR-LSP. Two forwarding tables are used to forward traffic in opposite
directions. The co-routed bidirectional static CR-LSP can go Up only when the conditions for
forwarding traffic in opposite directions are met. If the conditions for forwarding traffic in one
direction are not met, the bidirectional CR-LSP is in the Down state. If no IP forwarding
capabilities are enabled on the bidirectional CR-LSP, any intermediate node on the
bidirectional LSP can reply with a packet along the original path. The co-routed bidirectional
static CR-LSP supports the consistent delay and jitter for packets transmitted in opposite
directions, which guarantees QoS for traffic transmitted in opposite directions.
Implementation
A bidirectional co-routed static CR-LSP is manually established. A user manually specifies
labels and forwarding entries mapped to two FECs for traffic transmitted in opposite
directions. The outgoing label of a local node (also known as an upstream node) is equal to
the incoming label of a downstream node of the local node.
A node on a co-routed bidirectional static CR-LSP only has information about the local LSP
and cannot obtain information about nodes on the other LSP. A co-routed bidirectional static
CR-LSP shown in Figure 1-1130 consists of a CR-LSP and a reverse CR-LSP. The CR-LSP
originates from the ingress and terminates on the egress. Its reverse CR-LSP originates from
the egress and terminates on the ingress.
On the ingress, configure a tunnel interface and enable MPLS TE on the outbound
interface of the ingress. If the outbound interface is Up and has available bandwidth
higher than the bandwidth to be reserved, the associated bidirectional static CR-LSP can
go Up, regardless of the existence of transit nodes or the egress node.
On each transit node, enable MPLS TE on the outbound interface of the bidirectional
CR-LSP. If the outbound interface is Up and has available bandwidth higher than the
bandwidth to be reserved for the forward and reverse CR-LSPs, the associated
bidirectional static CR-LSP can go Up, regardless of the existence of the ingress, other
transit nodes, or the egress node.
On the egress, enable MPLS TE on the inbound interface. If the inbound interface is Up
and has available bandwidth higher than the bandwidth to be reserved for the
bidirectional CR-LSP, the associated bidirectional static CR-LSP can go Up, regardless
of the existence of the ingress node or transit nodes.
Background
MPLS networks face the following challenges:
Traffic congestion: RSVP-TE tunnels are unidirectional. The ingress forwards services to
the egress along an RSVP-TE tunnel. The egress forwards services to the ingress over IP
routes. As a result, the services may be congested because IP links do not reserve
bandwidth for these services.
Traffic interruptions: Two MPLS TE tunnels in opposite directions are established
between the ingress and egress. If a fault occurs on an MPLS TE tunnel, a traffic
switchover can only be performed for the faulty tunnel, but not for the reverse tunnel. As
a result, traffic is interrupted.
A forward CR-LSP and a reverse CR-LSP between two nodes are established. Each CR-LSP
is bound to the ingress of its reverse CR-LSP. The two CR-LSPs then form an associated
bidirectional CR-LSP. The associated bidirectional CR-LSP is mainly used to prevent traffic
congestion. If a fault occurs on one end, the other end is notified of the fault so that both ends
trigger traffic switchovers, which traffic transmission is uninterrupted.
Implementation
Figure 1-1131 illustrates an associated bidirectional CR-LSP that consists of Tunnel1 and
Tunnel2. The implementation of the associated bidirectional CR-LSP is as follows:
MPLS TE Tunnel1 and Tunnel2 are established using RSVP-TE signaling or manually.
The tunnel ID and ingress LSR ID of the reverse CR-LSP are specified on each tunnel
interface so that the forward and reverse CR-LSPs are bound to each other. For example,
in Figure 1-1131, set the reverse tunnel ID to 200 and ingress LSR ID to 4.4.4.4 on
Tunnel1 so the reverse tunnel is bound to Tunnel1.
The ingress LSR ID of the reverse CR-LSP is the same as the egress LSR ID of the forward CR-LSP.
The forward and reverse CR-LSPs can be established over the same path or over different paths.
Establishing the forward and reverse CR-LSPs over the same path is recommended to implement the
consistent delay time.
Usage Scenario
An associated bidirectional static CR-LSP transmits services and returned OAM PDUs
on MPLS networks.
An associated bidirectional dynamic CRLSP is used on an RSVP-TE network when
bit-error-triggered switching is used.
1.12.4.9 CBTS
Class-of-service based tunnel selection (CBTS) is a method of selecting a TE tunnel. Unlike
the traditional method of load-balancing services on TE tunnels, CBTS selects tunnels based
on services' priorities so that high quality resources can be provided for services with higher
priority. In addition, FRR and HSB can be configured for TE tunnels selected by CBTS. For
more information about FRR and HSB, see the section Configuration - MPLS - MPLS TE
Configuration - Configuring MPLS TE Manual FRR and Configuration - MPLS - MPLS TE
Configuration - Configuring CR-LSP Backup.
Background
Existing networks face a challenge that they may fail to provide exclusive high-quality
transmission resources for higher-priority services. This is because the policy for selecting TE
tunnels is based on public network routes or VPN routes, which causes a node to select the
same tunnels for services with the same destination IP or VPN address but with different
priorities.
Traffic classification can be configured on CBTS-capable devices to match incoming services
on the ingress's inbound interface against a specific match rule and map matching services to
configured priorities. A rule can be enforced based on traffic characteristics. Alternatively, a
QoS Policy Propagation Through the Border Gateway Protocol (QPPB) rule can be used
based on BGP community attributes in BGP routes.
Service class attributes can be configured on a tunnel to which services are iterated so that the
tunnel can transmit services with one or more priorities. Services with specified priorities can
only be transmitted on such tunnels, not be load-balanced by all tunnels to which they may be
iterated. The service class attribute of a tunnel can also be set to "default" so that the tunnel
transmits mismatching services with other priorities that are not specified.
Implementation
Figure 1-1132 illustrates CBTS principles. TE tunnels between LSRA and LSRB balance
services, including high-priority voice services, medium-priority Ethernet data services, and
common ATM data services. The implementation of transmitting services of each priority on a
specific tunnel is as follows:
Service classes EF, AF1+AF2, and default are configured for the three TE tunnels,
respectively.
Multi-field classification is configured on the PE to map voice services to EF and map
Ethernet services to AF1 or AF2.
The configuration is repeated. Voice services are transmitted along the TE tunnel that is
assigned the EF service class, Ethernet services along the TE tunnel that is assigned the
AF1+AF2 service class, and other services along the TE tunnel that is assigned the
default service class.
The default service class is not a mandatory setting. If it is not configured, mismatching services will be
transmitted along a tunnel that is assigned no service class. If every tunnel is configured with a service
class, these services will be transmitted along a tunnel that is assigned a service class mapped to the
lowest priority. The following service classes are prioritized in ascending order: BE, AF1, AF2, AF3,
AF4, EF, CS6, and CS7.
Usage Scenarios
TE tunnels or TE tunnels on an LDP over TE scenario are configured on a PE to
load-balance services.
L3VPN, VLL and VPLS services are configured on a PE. Inter-AS VPN services are not
supported.
LDP over TE is configured, and TE tunnels are established to load-balance services on a
P.
1.12.4.10 P2MP TE
Point-to-multipoint (P2MP) traffic engineering (TE) is a promising solution to multicast
service transmission. P2MP TE helps carriers provide high TE capabilities and increased
reliability on an IP/MPLS backbone network and reduce network operational expenditure
(OPEX).
Background
The proliferation of applications, such as IPTV, multimedia conference, and massively
multiplayer online role-playing games (MMORPGs), amplifies demands on multicast
transmission over IP/MPLS networks. These services require sufficient network bandwidth,
quality of service (QoS) capabilities, and high reliability. The following multicast solutions
are used to run multicast services, but these solutions fall short of the requirements of
multicast services or network carriers:
IP multicast technology: deployed on a live P2P network with software upgraded. This
solution reduces upgrade and maintenance costs. IP multicast, similar to IP unicast, does
not support QoS or TE capabilities and provides low reliability.
Dedicated multicast network: deployed using asynchronous transfer mode (ATM) or
synchronous optical network (SONET)/synchronous digital hierarchy (SDH)
technologies. This solution provides high reliability and transmission rates, but has high
construction costs and requires separate maintenance.
IP/MPLS backbone network carriers require a multicast solution that has high TE capabilities
and can be implemented by upgrading existing devices.
P2MP TE is such a solution. It combines advantages of efficient IP multicast forwarding and
E2E MPLS TE QoS capabilities. P2MP TE establishes a tree-shape tunnel that originates
from an ingress node and is destined for multiple egress nodes and reserves bandwidth for the
multicast packets along the tree path. This provides sufficient bandwidth and QoS capabilities
for multicast services over the tunnel. In addition, a P2MP TE tunnel supports fast reroute
(FRR), which provides high reliability for multicast services.
Benefits
The P2MP TE feature deployed on an IP/MPLS backbone network offers the following
benefits:
Optimizes your network bandwidth resources utilization.
Provides bandwidth assurance required by multicast services.
Eliminates the need to use Protocol Independent Multicast (PIM) in the MPLS core.
Related Concepts
The ingress cannot establish a P2MP TE tunnel after detecting either a crossover or re-merge event. A
user can modify an explicit path for a sub-LSP to resolve a crossover or re-merge problem.
Establishing a tunnel
Standard protocols defines a new RSVP extension that can be used to establish a P2MP
TE tunnel properly. Similar to a P2P TE tunnel, a P2MP TE tunnel is established using
Path and Resv messages that carry RSVP-TE signaling information. Path messages
originate from the ingress and travel along an explicit path to each leaf node. Leaf nodes
reply with Resv messages in the opposite direction of Path messages. After receiving a
Resv message, a node reserves bandwidth for a sub-LSP to be established. After
receiving all Resv messages, the ingress can properly establish a P2MP TE tunnel.
Figure 1-1135 demonstrates the process for establishing a P2MP TE tunnel.
A P2MP TE tunnel is to be established between the ingress PE1 and leaf nodes PE2 and PE3.
This tunnel consists of sub-LSPs over the path PE1 -> P -> PE2 and the path PE1 -> P -> PE3.
PE1 constructs a Path message for each leaf PE and sends the messages over an explicit path.
A P2MP TE tunnel is to be established between the ingress PE1 and leaf nodes PE2, PE3.
This tunnel consists of sub-LSPs over paths PE1 -> P -> PE2, PE1 -> P -> PE3. PE1
constructs a Path message for each leaf PE and sends the messages over an explicit path.
After receiving the Path message, every leaf PE replies with a Resv message carrying a
label assigned to its upstream node. The MPLS packets share the same incoming label on
the branch node, and the branch node builds a P2MP forwarding table. For example, the
P is the branch node shown in Figure 1-1136. Table 1-347 illustrates how a P2MP TE
tunnel is established.
In a VPLS over P2MP scenario or an NG MVPN over P2MP scenario, each service is transmitted
exclusively along a P2MP tunnel.
A P2MP TE tunnel is established on the network shown in Figure 1-1137. P2 is a branch node,
and PE2 is a bud node. Table 1-348 demonstrates the process for forwarding multicast packets
on each node over the P2MP TE tunnel.
Incoming Outgoin
Label g Label
PE1 N/A L11 Pushes an outgoing label with the value of 11 into
an IP multicast packet and forwards the packet to
P1.
P1 L11 L21 Swaps the incoming label with an outgoing label
with the value of 21 in an MPLS packet and
forwards the packet to P2.
P2 L11 LE22 Replicates the IP multicast packet, swaps the
(branch incoming label with an outgoing label in each
node) LE42 packet, and forwards each packet to a next hop
through a specific outbound interface.
PE2 (bud LE22 LE32 Replicates the packet, removes a label from one
node) packet, and forwards the packet to the CE.
None
Swaps the incoming label with an outgoing label
LE32 in the other packet before forwarding this
packet to PE3.
Incoming Outgoin
Label g Label
PE3 LE32 None Removes the label from the packet so that this
MPLS packet becomes an IP multicast packet.
PE4 LE42 None Removes the label from the packet so that this
MPLS packet becomes an IP multicast packet.
P2MP TE FRR
Fast reroute (FRR) can protect P2MP and P2P TE tunnels. The NE20E supports FRR link
protection, not node protection, over P2MP TE tunnels. TE FRR establishes a bypass tunnel to
protect sub-LSPs. If a link fails, traffic switches to the bypass tunnel within 50 milliseconds.
The P2P TE bypass tunnel is established over the path P1 -> P5 -> P2 on the network shown
in Figure 1-1138. It protects traffic over the link between P1 and P2. If the link between P1
and P2 fails, P1 switches traffic to the bypass tunnel destined for P2.
An FRR bypass tunnel must be manually configured. An administrator can configure an
explicit path for a bypass tunnel and determine whether or not to plan bandwidth for the
bypass tunnel.
P2P and P2MP TE tunnels can share a bypass tunnel. FRR protection functions for P2P and P2MP TE
tunnels are as follows:
A bypass tunnel bandwidth with planned bandwidth can be bound to a specific number of both P2P
and P2MP tunnels in configuration sequence. The total bandwidth of the bound P2P and P2MP
tunnels must be lower than or equal to the bandwidth of the bypass tunnel.
A bypass tunnel with no bandwidth can also be bound to both P2P and P2MP TE tunnels.
Supported Description
Function
1.12.4.3.1 Tunnel Enables the ingress to reestablish a CR-LSP over a better path.
Re-optimization Tunnel re-optimization is implemented in either of the following
modes:
Periodic re-optimization
When the specified interval for optimizing a CR-LSP expires,
Constraint Shortest Path First (CSPF) is triggered to calculate the
path of the CR-LSP. If the path calculated by CSPF has a metric
smaller than that of the existing CR-LSP, a new CR-LSP is
established along the new path. If the CR-LSP is successfully
established, the system notifies the forwarding plane to switch
traffic and tear down the original CR-LSP. After the process,
re-optimization is complete. If the CR-LSP is not set up, the
traffic is still forwarded along the existing CR-LSP.
Manual re-optimization
A re-optimization command is run in the user view to trigger
re-optimization.
Supported Description
Function
tunnels established on the same device.
1.12.4.11 Applications
1.12.4.11.1 P2MP TE Applications for IPTV
Service Overview
There is an increasing diversity of multicast services, such as IPTV, multimedia conference,
and massively multiplayer online role-playing games (MMORPGs), and multimedia
conferences. These services are transmitted over a service bearer network with the following
functions:
Forwards multicast traffic even during traffic congestion.
Rapidly detects network faults and switches traffic to a standby link.
Networking Description
point-to-multipoint (P2MP) Traffic Engineering (TE) supported on NE20Es is used on the
IP/MPLS backbone network shown in Figure 1-1139. P2MP TE helps the network prevent
multicast traffic congestion and maintain reliability.
Feature Deployment
Figure 1-1139 illustrates how P2MP TE tunnels are used to transmit IP multicast services. The
process consists of the following stages:
Import multicast services.
Definition
Seamless MPLS is a bearer technique that extends MPLS techniques to access networks.
Seamless MPLS establishes an E2E LSP across the access, aggregation, and core layers. All
services can be encapsulated using MPLS at the access layer and transmitted along the E2E
LSP across the three layers.
Purpose
MPLS is a mature and well-known technology and has been adopted by a growing number of
service providers in network construction. MPLS can integrate multiple networks on an
Ethernet-based infrastructure, making full use of the benefits of a uniform forwarding model
and reducing network construction costs. MPLS has been widely used on aggregation and
core networks.
With current trends moving towards a flat network structure, metropolitan area networks
(MANs) are steadily evolving into the Ethernet architecture, which calls for the application of
MPLS on the MAN and access networks. To meet this requirement, seamless MPLS was
developed. Seamless MPLS uses existing BGP, IGP, and MPLS techniques to establish an
E2E LSP across the access, aggregation, and core layers, allowing end-to-end traffic to be
encapsulated and forwarded using MPLS.
Benefits
Seamless MPLS offers the following benefits:
Integrates the access, aggregation, and core layers into one MPLS network, encapsulates
all services using MPLS, and transmits these services along an E2E LSP. Seamless
MPLS simplifies network provisioning, operation, and maintenance.
Supports high deployment flexibility and scalability. On a seamless MPLS network, an
LSP can be established between any two nodes to roll out services.
1.12.5.2 Principles
1.12.5.2.1 Basic Principles of Seamless MPLS
Usage Scenario
Seamless MPLS establishes a BGP LSP across the access, aggregation, and core layers and
transmits services along the E2E BGP LSP. Service traffic can be transmitted between any
two points on the LSP. The seamless MPLS network architecture maximizes service
scalability using the following functions:
Allows access nodes to signal all services to an LSP.
Uses the same transport layer convergence technique to rectify all network-side faults,
without affecting service transmission.
Seamless MPLS networking solutions are as follows:
Intra-AS seamless MPLS: The access, aggregation, and core layers are within a single
AS. Intra-AS seamless MPLS applies to mobile bearer networks.
Inter-AS seamless MPLS: The access and aggregation layers are within a single AS,
whereas the core layer in another AS. Inter-AS seamless MPLS is mainly used to
transmit enterprise services.
Inter-AS seamless MPLS+HVPN: A cell site gateway (CSG) and an aggregation (AGG)
node establish an HVPN connection, and the AGG and a mobile aggregate service
gateway (MASG) establish a seamless MPLS LSP. The AGG provides hierarchical
L3VPN access services and routing management services. Seamless MPLS+HVPN
combines the advantages of both MPLS and HVPN. Seamless MPLS allows any two
nodes on an inter-AS LSP to transmit services at the access, aggregation, and core layers,
providing high service scalability. HVPN enables carriers to reduce network deployment
costs by deploying devices with layer-specific capacities to meet service requirements.
Network Description
Deployment
Deploy
routing Figure 1-1140 Deploying routing protocols for the intra-AS
protocols. seamless MPLS networking
Control
plane As shown in Figure 1-1140, routing protocols are deployed on
devices as follows:
An IGP (IS-IS or OSPF) is enabled on devices at each of the
access, aggregation, and core layers to implement intra-AS
connectivity.
The path CSG1 -> AGG1 -> core ABR1 -> MASG1 is used
in the following example. An IBGP peer relationship is
established between each of the following pairs of devices:
− CSG and AGG
− AGG and core ABR
− Core ABR and MASG
The AGG and core ABR are configured as route reflectors
(RRs) so that the CSG and MASG can obtain routes
destined for each other's loopback addresses.
Network Description
Deployment
The AGG and core ABR set the next hop addresses in BGP
routes to their own addresses to prevent advertising
unnecessary IGP area-specific public routes.
Deploy
tunnels. Figure 1-1141 Deploying tunnels for the intra-AS seamless MPLS
networking
Forwarding plane Figure 1-1142 Forwarding plane for the intra-AS seamless MPLS
networking
Network Description
Deployment
Network Description
Deployment
Deploy
routing Figure 1-1143 Deploying routing protocols for the inter-AS
protocols. seamless MPLS networking
Deploy
tunnels. Figure 1-1144 Deploying tunnels for the inter-AS seamless MPLS
networking
Network Description
Deployment
Forwarding plane Figure 1-1145 Forwarding plane for the inter-AS seamless MPLS
networking with a BGP LSP established in the core area
Network Description
Deployment
Network Description
Deployment
The VPN packet transmission along the inter-AS seamless
MPLS tunnel is complete.
Network Description
Deployment
Deploy
Control
routing Figure 1-1147 Deploying routing protocols for the inter-AS
plane
protocols. seamless MPLS+HVPN networking
Network Description
Deployment
Deploy
tunnels. Figure 1-1148 Deploying tunnels for the inter-AS seamless
MPLS+HVPN networking
Network Description
Deployment
A public network tunnel is established using LDP or TE in
each IGP area.
The AGGs, AGG ASBRs, core ASBRs, and MASGs are
enabled to advertise labeled routes. They assign labels to
BGP routes that match a specified routing policy. After they
exchange BGP routes, a BGP LSP can be established
between each pair of an AGG and MASG.
Forwarding plane Figure 1-1149 illustrates the forwarding plane of the inter-AS
seamless MPLS+HVPN networking. Seamless MPLS is mainly
used to transmit VPN packets. The following example
demonstrates how VPN packets, including labels and data, are
transmitted from a CSG to an MASG along the path CSG2 ->
AGG1 -> AGG ASBR1 -> core ASBR1-> MASG1.
1. The CSG pushes an MPLS tunnel label into each VPN
packet and forwards the packets to the AGG.
2. The AGG removes the access-layer MPLS tunnel labels
from the packets and pushes a BGP LSP label. It then adds
aggregation-layer MPLS tunnel labels to the packets and
then proceeds to forward them to the AGG ABR. If the PHP
function is enabled on the AGG, the CSG has removed the
MPLS tunnel labels from the packets, and therefore, the
AGG receives packets without MPLS tunnel labels.
3. The AGG ASBR then removes the MPLS tunnel labels from
packets and swaps the existing BGP LSP label for a new
label in each packet. It then forwards the packets to the core
ASBR. If the PHP function is enabled on the AGG ASBR,
the AGG has removed the MPLS tunnel labels from the
packets, and therefore, the AGG ASBR receives packets
Network Description
Deployment
without MPLS tunnel labels.
4. After the core ASBR receives the packets, it swaps a BGP
LSP label for a new label and adds a core-layer MPLS tunnel
label to each packet. It then forwards the packets to the
MASG.
5. The MASG removes MPLS tunnel labels, BGP LSP labels,
and VPN labels from the packets. If the PHP function is
enabled on the MASG, the core ASBR has removed the
MPLS tunnel labels from the packets, and therefore, the
MASG receives packets without MPLS tunnel labels.
The VPN packet transmission along the seamless MPLS
tunnel is complete.
Reliability
Seamless MPLS network reliability can be improved using a variety of functions. If a network
fault occurs, devices with reliability functions enabled immediately detect the fault and switch
traffic from active links to standby links.
The following examples demonstrate the reliability functions used on an inter-AS seamless
MPLS network.
A fault occurs on a link between a CSG and an AGG.
As shown in Figure 1-1150, the active link along the primary path between CSG1 and
AGG1 fails. After BFD for LDP or BFD for CR-LSP detects the fault, the BFD module
uses LDP FRR, TE Hot-standby or BGP FRR to switch traffic from the primary path to
the backup path.
Figure 1-1150 Traffic protection triggered by a fault in the link between the CSG and AGG
on the inter-AS seamless MPLS network
As shown in Figure 1-1151, BGP Auto FRR is configured on CSGs and AGG ASBRs to
protect traffic on the BGP LSP between CSG1 and MASG1. If BFD for LDP or BFD for
TE detects AGG1 faults, the BFD module switches traffic from the primary path to the
backup path.
Figure 1-1151 Traffic protection triggered by a fault in an AGG on the inter-AS seamless
MPLS network
Figure 1-1152 Traffic protection triggered by a fault in the link between an AGG and an
AGG ASBR on the inter-AS seamless MPLS network
Auto FRR switches both upstream and downstream traffic from the primary path to
backup paths.
Figure 1-1153 Traffic protection triggered by a fault in an AGG ASBR on the inter-AS
seamless MPLS network
A fault occurs on the link between an AGG ASBR and a core ASBR.
As shown in Figure 1-1154, BFD for interface is configured on AGG ASBR1 and core
ASBR1. If the BFD module detects a fault in the link between AGG ASBR1 and core
ASBR1, the BFD module triggers the BGP Auto FRR function. BGP Auto FRR switches
both upstream and downstream traffic from the primary path to backup paths.
Figure 1-1154 Traffic protection triggered by a fault in the link between an AGG ASBR and
a core ASBR on the inter-AS seamless MPLS network
As shown in Figure 1-1155, BFD for interface and BGP Auto FRR are configured on
AGG ASBR1. BGP Auto FRR and BFD for LDP (or BFD for TE) are configured on
MASGs to protect traffic on the BGP LSP between CSG1 and MASG1. If the BFD
module detects a fault in core ASBR1, it switches both upstream and downstream traffic
from the primary path to backup paths.
Figure 1-1155 Traffic protection triggered by a fault in a core ASBR on the inter-AS
seamless MPLS network
Figure 1-1156 Traffic protection triggered by a link fault in a core area on the inter-AS
seamless MPLS network
Figure 1-1157 Traffic protection triggered by a fault in an MASG on the inter-AS seamless
MPLS network
Background
The IP/MPLS network shown in Figure 1-1158 transmits VPN services. PEs, such as a CSG,
AGG, ASBR, and MASG, establish multi-segment MPLS tunnels between directly connected
devices. In this case, VPN service provision on PEs is complex, and the VPN service
scalability decreases. As PEs establish BGP peer relationships, a routing policy can be used to
assign MPLS labels for BGP routes so that an E2E BGP tunnel can be established. The BGP
tunnel consists of a primary BGP LSP and a backup BGP LSP. VPN services can travel along
the E2E BGP tunnel, which simplifies service provision and improves VPN service
scalability.
To rapidly detect faults in an E2E BGP tunnel, BFD for BGP tunnel is used. BFD for BGP
tunnel establish a dynamic BFD session, also called a BGP BFD session, which is bound to
both the primary and backup BGP LSPs. If both BGP LSPs fail, the BGP BFD session detects
the faults and triggers VPN FRR switching.
Usage Scenarios
BFD for BGP tunnel is used in the following scenarios:
Inter-AS VPN Option C scenario
Intra- or inter-AS seamless MPLS scenario
Principles
Dynamic BGP BFD sessions are established using either of the following policies:
Host address-based policy: used when all host addresses are available to trigger the
creation of BGP BFD sessions.
IP address prefix list-based policy: used when only some host addresses can be used to
establish BFD sessions.
A BGP BFD session working in asynchronous mode monitors BGP LSPs over BGP tunnels.
In Figure 1-1159, the ingress (CSG) and egress (MASG) of E2E BGP LSPs exchange BFD
packets periodically. The forward path is a BGP LSP, and the reverse path is an IP route. If
either node receives no BFD packet after a specified detection period elapses, the node
considers the BGP LSP faulty. If both the primary and backup BGP LSPs fail, the BGP BFD
session triggers VPN FRR switching.
1.12.5.3 Applications
1.12.5.3.1 Seamless MPLS Applications in VPN Services
Service Overview
With the growth of third generation of mobile telecommunications (3G) and Long Term
Evolution (LTE) services, inter-AS leased line services becomes the key services. To carry
these services over VPNs, seamless MPLS establishes an E2E LSP between a cell site
gateway (CSG) and a mobile aggregate service gateway (MASG) to transmit virtual private
network (VPN) services, as well as helps carriers reduce costs of network construction,
operation, and maintenance. Seamless MPLS also allows carriers to uniformly operate and
maintain networks.
Networking Description
Figure 1-1160 illustrates an LTE network. The access and aggregation layers belong to one
AS, and the core layer belongs to another AS. To transmit VPN services, the inter-AS
seamless MPLS+HVPN networking can be used to establish an LSP between each pair of a
CSG and MASG. CSGs are connected to NodeBs that are Wideband Code Division Multiple
Access (WCDMA) 3G base stations and eNodeBs that are LTE base stations. MASGs are
connected to a mobility management entity (MME) or service gateway (SGW). VPN
instances can be configured between CSGs and MASGs to transmit various types of services.
An HVPN is deployed between each pair of a CSG and aggregation (AGG) node, and an
inter-AS LSP is established between each pair of an AGG and MASG using the seamless
MPLS technique. A NodeB or an eNodeB can then communicate with the MME or SGW.
Enterprise leased line The large-scale enterprise VPN services can be provisioned. The
services Layer 2 and Layer 3 leased lines connected to CSGs are easily
deployed.
Protection switching The following protection switching functions can be configured:
TE hot standby or LDP FRR: monitors TE LSPs or LDP LSPs.
BGP FRR: monitors BGP LSPs.
VPN FRR: monitors VPN connections.
CSG performance CSGs that maintain a few routes only need to process packets each
requirements with two labels.
The GMPLS UNI solution is only used for interconnection between Huawei forwarders and Huawei
controllers.
Purpose
In an era when IP technologies evolve quickly and data transmission becomes demanding, IP
services impose higher requirements on bandwidth of transport networks. Mainstream
bandwidth of transport networks has quickly changed from 155 Mbit/s and 622 Mbit/s to 2.5
Gbit/s and 10 Gbit/s, and to 40 Gbit/s and 100 Gbit/s at present. The processing granularity
(VC4) of Synchronous Digital Hierarchy (SDH) networks, however, lags behind. In this case,
the Dense Wavelength Division Multiplexing (DWDM) technique becomes one of options to
construct a transport network. To provide an end-to-end DWDM solution, the issue in the
communication between routers and DWDM devices must be addressed in advance.
To be specific, many User-Network Interfaces (UNIs) are statically configured between IP
networks and transport networks, but this configuration has many drawbacks:
Transmission channels between IP networks and transport networks need to be
configured manually, which is time consuming and increases carriers' network
construction cost.
When a fault occurs and both the primary and secondary paths fail, additional
configurations are needed to restore services, increasing carriers' network maintenance
cost.
Bandwidth cannot be dynamically adjusted because IP networks and transport networks
are interconnected based on static configurations. This defect will waste abundant
network resources and lead to unnecessary capacity expansion.
The automatic UNI service deployment feature provided by Generalized Muti-Protocol Label
Switching (GMPLS) properly solves the preceding problems. GMPLS provides packet
switching, wavelength switching, time division switching, and spatial switching, supports
multiple interconnection models between transmission networks and IP networks, and truly
implements an end-to-end solution. GMPLS brings the following benefits:
Simplified network management, intelligent service provisioning, flexible transmission
channel setup, and lower operational&maintenance cost
Abundant protection levels, enhanced network robustness based on an effective
protection recovery mechanism, and lower operational&maintenance cost
Flexible resource allocation policies, improved network resource usage, and lower
pressure on capacity expansion
Definition
GMPLS is developed from MPLS so that it inherits nearly all MPLS features and protocols.
GMPLS also extends the definition of MPLS labels and it can be considered as an extension
of MPLS in transmission networks. GMPLS provides a unified control plane for the IP layer
and transport layer. In this manner, the network architecture is simplified, the network
management cost is reduced, and the network performance is optimized.
The GMPLS User-Network Interface (UNI) is defined by IETF as a network connection
interface. It is applicable to the overlay model in the GMPLS network structure and it meets
the trend in network development.
GMPLS UNI extends MPLS in the following aspects:
Supports multiple network interface types and supports switching of packets, timeslots,
wavelengths, and ports.
Supports explicit routes and explicit labels.
Supports bidirectional LSPs.
Separates the control plane from the data plane, supports outband signaling, and prevents
a failure in the control plane from affecting the data plane.
Enables fast fault detection in the control plane and supports end-to-end recovery and
protection.
Supports service security mechanisms and service policy authentication.
Supports LSP graceful deletion.
1.12.6.2 Principles
1.12.6.2.1 Basic Concepts
Generalized Multiprotocol Label Switching (GMPLS) extends the traditional MPLS
technology and applies to the transport layer. To seamlessly integrate the IP and transport
layers, GMPLS extends MPLS labels and uses labels to identify Time Division Multiplexing
(TDM) time divisions, wavelengths, and optical fibers, in addition to data packets. GMPLS
adds labels to packets during IP data switching, TDM electrical circuit switching (primarily
applying to Synchronous Digital Hierarchy [SDH]/Synchronous Optical Network [SONET]),
and spatial switching. GMPLS separates control and data channels and uses the Link
Management Protocol (LMP) to manage and maintain links. GMPLS supports multiple
models for interconnecting the IP and transport networks, meeting requirements for IP and
transport network convergence.
Peer model: Figure 1-1162 shows the peer model networking. IP devices and transport
devices are operating in a single GMPLS domain. IP and transport network topologies
are visible to each other. End-to-end (E2E) GMPLS tunnels can be established and they
originate from an IP network, pass through a transport network, and are destined for
another IP network.
Border peer model: Figure 1-1163 shows the border peer model networking. A transport
network and edge nodes that directly connect the IP networks to the transport network
are in the same GMPLS domain. The transport network topology is invisible to non-edge
nodes on the IP networks. A path for a GMPLS tunnel between the edge nodes across the
transport network can be calculated.
Peer model Both IP address space and Using the peer model is
signaling protocols can be difficult because the entire
planned for transport live network must be
devices and IP routers. The upgraded. Transport devices
transport devices and IP and IP routers need to use
routers can establish reliable the same signaling
connections. This model protocols, increasing the
allows rapid service rollout possibility of security risks.
and planning of E2E optimal
paths.
Border peer model IP routers are isolated from The edge nodes must have
transport devices, except for high performance. Security
edge nodes. The transport deteriorates in this model.
network topology is visible This model does not support
to the boundary routers on E2E optimal path planning.
the IP network.
Overlay model Transport and IP network Planning E2E optimal paths
devices must have clearly for GMPLS tunnels is
defined UNI information. difficult. UNI bandwidth
They do not need to learn usage is lower in this model
about routing or topology than in the other two
information of each other or models. The overlay model
exchange information. The requires UNI interface
overlay model provides high planning.
security and has low
upgrade requirements.
The NE20E only supports the overlay model, in compliance with the GMPLS UNI model
defined in relevant standards. The GMPLS UNI model is used in the following sections.
Figure 1-1164 shows the GMPLS UNI model networking. Edge nodes on overlay networks
running IP are directly connected to transport devices on a core transport network along TE
links. Only the edge nodes can initiate the establishment of a UNI tunnel to travel through the
core network. On the IP networks, only edge nodes need to support GMPLS UNI functionality.
The GMPLS UNI model involves the following concepts:
Ingress EN: refers to an edge node that directly connects an IP network to a transport
network. A GMPLS UNI tunnel originates from the ingress EN.
Ingress CN: refers to an edge node that directly connects a transport network to the
ingress EN.
Egress EN: refers to an edge node that directly connects an IP network to a transport
network. A GMPLS UNI tunnel is destined for the egress EN.
Egress CN: refers to an edge node that directly connects a transport network to the egress
EN.
UNI: sends requests for bandwidth used for connections to the transport network.
Network network interface (NNI): connects nodes within the transport network.
does not affect the data channel, ensuring uninterrupted service forwarding. The data and
control channels are separated in either out-of-band or in-band mode. Out-of-band separation
means that the data and control channels' physical links are separate. For example, the two
channels use separate physical interfaces, time divisions, or wavelengths. In-band separation
means that the data and control channels use the same physical links but different protocol
overheads. For example, an Ethernet network uses OAM to carry control packets and an SDH
network uses the dial control center (DCC) byte overheads to carry control packets. The
NE20E only supports out-of-band Ethernet channels and in-band Ethernet OAM channels.
LMP
The LMP(Link Management Protocol) protocol used in GMPLS manages links of the control
and data channels. Relevant standards describes the major functions of LMP, including:
Control channel management: Dynamic LMP automatically discovers neighbors and
creates, maintains, and manages a control channel.
Link attribute association: LMP bundles multiple data links between two directly
connected nodes into a TE link, and synchronizes TE link attributes such as switching
types and code types between the two directly connected nodes.
Link connectivity verification: LMP verifies the connectivity of a data channel separated
from a control channel. LMP can verify the connectivity of multiple data channels
simultaneously.
Fault management: LMP rapidly detects data link failures in unidirectional and
bidirectional LSPs, locates and isolates faults, and triggers appropriate protection and
recovery mechanisms. After a fault is removed, LMP sends a notification about link
recovery. Fault management is performed on links only between adjacent nodes.
LMP is classified into the following types:
Static LMP: LMP neighbors are manually configured and no LMP packet needs to be
sent between them.
Dynamic LMP: LMP neighbors, a control channel, a TE link, and data links are all
automatically discovered, minimizing configurations and speeding up network
construction.
The NE20E only supports static LMP. This means that LMP neighbors, control channels, and
data channels are manually configured.
NE1->NE2->NE3 (as indicated by the red line in the figure). In this manner, a direct link from
NE1 to NE3 is established on the transport network, and a GMPLS UNI tunnel with the path
Device1->NE1->NE2->NE3->Device2 is established over the transport network.
labels of both UNI LSPs are assigned to the egress EN. Then the egress EN creates a
Resv message and sends the message to the egress CN.
5. After the Resv message reaches the egress CN, a label of the forward UNI LSP is
assigned to the egress CN and the label is sent to the ingress CN in a Resv message
through the FA tunnel.
6. After the Resv message reaches the ingress CN, a label of the forward UNI LSP is
assigned to the ingress CN and the label is sent to the ingress EN in a Resv message.
7. After the Resv message reaches the ingress EN, a label of the forward UNI LSP is
assigned to the ingress EN.
In this manner, each node is informed of the forward/reverse UNI LSP label of the adjacent
node and a bidirectional UNI LSP is then successfully set up.
1.12.6.2.4 UNI Tunnel Calculation Using Both IP and Optical PCE Servers
Background
IP service provision on E2E backbone networks faces the following challenges:
Optical and IP layers cannot share topology information. Therefore, path planning can
only be manually performed at both the optical and IP layers. Optimizing path
calculation and network resources is difficult.
Inter-layer network deployment is performed by collaboration of IP and optical
departments, which delays service rollout.
To face the preceding challenges, the NE20E uses both the IP Path Computation Element
(PCE) and optical PCE functions to calculate paths for GMPLS UNI tunnels.
With this path calculation function, the IP and optical PCE servers automatically implement
path planning, which reduces manual workload and speeds up service rollout.
Principles
An ingress EN on a GMPLS UNI functions as a PCE client and requests an IP PCE server to
calculate paths. Upon receipt of the request, the IP PCE server works with an optical PCE
server to calculate path and sends path information to the ingress EN. The ingress EN
automatically establishes a GMPLS UNI tunnel over the calculated path.
In the following example, the IP and optical PCE servers are used simultaneously to calculate
a path for a GMPLS UNI tunnel between the ingress EN and egress EN. The implementation
is as follows:
1. The ingress EN sends a delegate path request for a GMPLS UNI tunnel to an IP PCE
server.
2. Upon receipt of the request, the IP PCE server instructs the optical PCE server to
calculate a path within an optical network.
3. The optical PCE server sends path information to the IP PCE server.
4. The IP PCE server sends all path information to the ingress EN.
5. The ingress EN sends RSVP messages to the ingress CN and starts to establish a
GMPLS UNI tunnel. The GMPLS UNI tunnel establishment process is similar to the
common GMPLS UNI tunnel establishment process. The establishment procedure is not
described.
Figure 1-1167 Flowchart for using both the IP and optical PCE servers to calculate paths
Benefits
Simultaneously using the IP and optical PCE servers to calculate a path for a GMPLS UNI
tunnel offers the following benefits:
Automatically performs a great amount of site deployment planning, which reduces labor
costs.
Eliminates the collaboration between the IP and optical service departments, which
speeds up site deployment.
1.12.6.2.5 SRLG Sharing Between Optical and IP Layers Within a Transport Network
Background
Although the IP layer and optical layer are connected, they cannot exchange routing
information. The active and standby links at the IP layer can only be separated using statically
planned SRLGs within the optical network, which delays service rollout and increases
maintenance workload. To address these problems, the SRLG sharing function can be used.
RSVP signaling at the optical layer sends SRLG attributes of transport links to the IP layer.
The IP layer applies the SRLG attributes to IP links. This function helps select reliable paths
for high reliability services at the IP layer based on SRLG constraints.
Principles
When a GMPLS UNI tunnel is established using RSVP, the extended RSVP protocol carries
SRLG information on optical links to both ends of the GMPLS UNI tunnel. SRLG
information is processed as TE SRLG information that is used to bind the GMPLS UNI tunnel
to UNI links, which separates links for the primary and backup TE tunnels.
RSVP Path or Resv messages carry SRLG sub-objects to notify the IP layer of SRLG
information about paths on an optical network. The ingress CN and engress CN at the IP layer
flood the SRLG information to the other devices on at the IP layer. Then all devices on the
network can establish the primary and backup TE tunnels on different links, preventing path
overlapping.
Figure 1-1168 SRLG sharing between optical and IP layers within a transport network
A GMPLS UNI tunnel is bidirectional so that both ends of the tunnel needs to be bound to
logical GMPLS UNI interfaces and both logical interfaces need to be advertised to their
respective IP networks. The statuses of the bound logical interfaces are associated with the
GMPLS UNI status. If the UNI LSP is established, the bound logical interfaces go Up. If no
UNI LSP is established, the bound logical interfaces go Down. In real-world situations, a
GMPLS UNI tunnel is configured on logical interfaces in a way similar to the configuration
of a routing protocol or the MPLS function on the logic interfaces. This merit makes the
configuration of a GMPLS UNI tunnel more acceptable by users.
Related Version
The following table lists the product version related to this document.
Intended Audience
This document is intended for:
Network planning engineers
Commissioning engineers
Data configuration engineers
System maintenance engineers
Security Declaration
Encryption algorithm declaration
The encryption algorithms DES/3DES/SKIPJACK/RC2/RSA (RSA-1024 or
lower)/MD2/MD4/MD5 (in digital signature scenarios and password encryption)/SHA1
(in digital signature scenarios) have a low security, which may bring security risks. If
protocols allowed, using more secure encryption algorithms, such as AES/RSA
(RSA-2048 or higher)/SHA2/HMAC-SHA2 is recommended.
Special Declaration
This document serves only as a guide. The content is written based on device
information gathered under lab conditions. The content provided by this document is
intended to be taken as general guidance, and does not cover all scenarios. The content
provided by this document may be different from the information on user device
interfaces due to factors such as version upgrades and differences in device models,
board restrictions, and configuration files. The actual user device information takes
precedence over the content provided by this document. The preceding differences are
beyond the scope of this document.
The maximum values provided in this document are obtained in specific lab
environments (for example, only a certain type of board or protocol is configured on a
tested device). The actually obtained maximum values may be different from the
maximum values provided in this document due to factors such as differences in
hardware configurations and carried services.
Interface numbers used in this document are examples. Use the existing interface
numbers on devices for configuration.
The pictures of hardware in this document are for reference only.
Symbol Conventions
The symbols that may be found in this document are defined as follows.
Symbol Description
Indicates an imminently hazardous situation which, if not
avoided, will result in death or serious injury.
Change History
Updates between document issues are cumulative. Therefore, the latest document issue
contains all updates made in previous issues.
Changes in Issue 03 (2017-09-20)
This issue is the third official release. The software version of this issue is
V800R009C10SPC200.
Changes in Issue 02 (2017-07-30)
This issue is the second official release. The software version of this issue is
V800R009C10SPC100.
Changes in Issue 01 (2017-05-30)
This issue is the first official release. The software version of this issue is
V800R009C10.
Definition
Segment routing (SR) is a protocol designed to forward data packets on a network based on
source routes. Segment routing divides a network path into several segments and assigns a
segment ID to each segment and network forwarding node. The segments and nodes are
sequentially arranged (segment list) to form a forwarding path.
Segment routing encodes the segment list identifying a forwarding path into a data packet
header. The segment ID is transmitted along with the packet. After receiving the data packet,
the receive end parses the segment list. If the top segment ID in the segment list identifies the
local node, the node removes the segment ID and proceeds with the follow-up procedure. If
the top segment ID does not identify the local node, the node uses the Equal Cost Multiple
Path (ECMP) algorithm to forward the packet to a next node.
Purpose
With the progress of the times, more and more types of services pose a variety of network
requirements. For example, real-time UC&C applications prefer to paths of low delay and low
jitter, and big data applications prefer to high bandwidth tunnels with a low packet loss rate.
In this situation, the rule helping the network adapt to service growth cannot catch up with the
rapid service development and even makes network deployment more complex and difficult
to maintain.
The solution is to allow services to drive network development and to define the network
architecture. Specifically, an application raises requirements (on the delay, bandwidth, and
packet loss rate). A controller collects information, such as network topology, bandwidth
usage, and delay information and computes an explicit path that satisfies the service
requirements.
Segment routing emerges in this context. Segment routing is used to simply define an explicit
path. Nodes need to merely maintain the segment routing information to adapt to rapid service
growth in real time. Segment routing has the following characteristics:
Extends existing protocols such as IGP to allow for better smooth evolution of live
networks.
The SR supports both the controller's centralized control mode and forwarder's
distributed control mode, providing a balance between centralized control and the
distributed control.
Uses the source routing technique to provide capabilities of rapid interaction between
networks and upper-layer applications.
Benefits
Segment routing offers the following benefits:
The control plane of MPLS network is simplified.
A controller or an IGP is used to uniformly compute paths and distribute labels, without
using RSVP-TE or LDP. Segment Routing can be directly applied to the MPLS
architecture without any change in the forwarding plane.
Provides efficient topology independent-loop-free alternate (TI-LFA) FRR protection for
fast path failure recovery.
Based on the Segment Routing technology, combined with the RLFA (Remote Loop-free
Alternate) FRR algorithm, an efficient TI-LFA FRR algorithm is formed. TI-LFA FRR
supports node and link protection of any topology and overcomes drawbacks in
conventional tunnel protection.
Provides the higher network capacity expansion capability.
1.13.2.3 Principles
1.13.2.3.1 Basic Principles
Basic Concepts
Segment routing involves the following concepts:
Segment routing domain: is a set of SR nodes.
Segment ID (SID): uniquely identifies a segment. A SID is mapped to an MPLS label on
the forwarding plane.
SRGB: A segment routing global block (SRGB) is a set of local labels reserved for
segment routing of users.
Segment Category
An example of Prefix SIDs, Adjacency SIDs, and Node SIDs is shown in Figure 1-1171.
In simple words, a prefix segment indicates a destination address, and an adjacency segment
indicates a link over which data packets travel. The prefix and adjacency segments are similar
to the destination IP address and outbound interface, respectively, in conventional IP
forwarding. In an IGP area, a network element (NE) sends extended IGP messages to flood its
own node SID and adjacency SID. Upon receipt of the message, any NE can obtain
information about the other NEs.
Combining prefix (node) SIDs and adjacency SIDs in sequence can construct any network
path. Every hop on a path identifies a next hop based on the segment information on the top of
the label stack. The segment information is stacked in sequence at the top of the data header.
If segment information at the stack top contains the identifier of another node, the
receive end forwards a data packet to a next hop using ECMP.
If segment information at the stack identifies the local node, the receive end removes the
top segment and proceeds with the follow-up procedure.
In actual application, the prefix segment, adjacency segment, and node segment can be used
independently or in combinations. The following three main cases are involved.
Prefix Segment
A prefix segment-based forwarding path is computed by an IGP using the SPF algorithm. In
Figure 1-1172, node Z is a destination, and its prefix SID is 100. After an IGP floods the
prefix SID, all nodes in the IGP area lean the prefix SID of node Z. Each node runs SPF to
compute the shortest path to node Z. Such a path is a smallest-cost path.
If there are several paths have the same cost, they perform ECMP. If they have different costs,
they perform link backup. The prefix segment-based forwarding paths are not fixed, and the
ingress cannot control the whole forwarding path.
Adjacency Segment
In Figure 1-1173, an adjacency segment is assigned to each adjacency. The adjacency
segments are contained in a segment list defined on the ingress. The segment list is used to
strictly specify any explicit path. This mode can better implement SDN.
SR Forwarding Mechanism
SR can be used directly in the MPLS architecture, where the forwarding mechanism remains.
SIDs are encoded as MPLS labels. The segment list is encoded as a label stack. The segment
to be processed is at the stack top. Once a segment is processed, its label is removed from a
label stack.
1.13.2.3.2 SR LSP
SR LSPs are established using the segment routing technique, uses prefix or node segments to
guide data packet forwarding. Segment Routing Best Effort (SR-BE) uses an IGP to run the
shortest path algorithm to compute an optimal SR LSP.
The establishment and data forwarding of SR LSPs are similar with those of LDP LSPs. SR
LSPs have no tunnel interfaces.
Creating an SR LSP
Creating an SR LSP involves the following operations:
Devices report topology information to a controller (if the controller is used to create a
tunnel) and are assigned labels.
The devices compute paths.
SR LSPs are created primarily using prefix labels. A destination node runs an IGP to advertise
prefix SIDs, and forwarders parse them and compute label values based on local SRGBs.
Each node then runs an IGP to collect topology information, runs the SPF algorithm to
calculate a label forwarding path, and delivers the computed next hop and outgoing label
(OuterLabel) to the forwarding table to guide data packet forwarding.
Table 1-356 describes the process of using prefix labels to create an LSP shown in Figure
1-1175.
St Dev Operation
e ice
p
St Dev Operation
e ice
p
IS-IS calculates an outgoing label based on the following formula:
OuterLabel = SRGB start value advertised by the next hop devices + Prefix
SID value = 16000 + 100 = 16100
Here, the next-hop device is device D, and device D releases the SRGB
(16000 to 65535).
3 B The calculation process is similar to C:
Label = 26000 + 100 = 26100
OuterLabel = 36000 + 100 = 36100
4 A The calculation process is similar to C:
Label = 6000 + 100 = 6100
OuterLabel = 26000 + 100 = 26100
Data Forwarding
Similar to MPLS, SR-TE operates labels by pushing, swapping, or popping them.
Push: After a packet enters an SR LSP, the ingress adds a label between the Layer 2 and
IP header. Alternatively, the ingress adds a label stack above the existing label stack.
Swap: When packets are forwarded in an SR domain, a node searches the label
forwarding table for a label assigned by a next hop and swaps the label on the top of the
label stack with the matching label in each SR packet.
Pop: After the packets leave out of an SR-TE tunnel, a node finds an outbound interface
mapped to the label on the top of the label stack and removes the top label.
Table 1-357 describes the data forwarding process on the network shown in Figure 1-1176.
St Dev Operation
e ice
p
1 A Receives a data packet, adds label 26100 to the packet, and forwards the
packet.
2 B Receives the labeled packet, swaps label 26100 for label 36100, and forwards
the packet.
3 C Receives the labeled packet, swaps label 36100 for label 16100, and forwards
the packet.
4 D Removes label 16100 and forwards the packet along a matching route.
explicit-null PHP is not The MPLS The MPLS TTL Label resources
supported. The EXP field is processing is on the egress
egress assigns reserved. QoS normal. are saved. If
an explicit-null is supported. E2E services
label. The IPv4 carry QoS
explicit-null attributes to be
label value is 0. contained in the
EXP field in a
label, an
explicit-null can
be used.
implicit-null PHP is There is no There is no The forwarding
supported. The MPLS EXP MPLS TTL burden on the
egress assigns field on the field on the egress is
an implicit-null egress, and QoS egress, so it can reduced, and
label. The is not not be copied to forwarding
implicit-null supported. the IP TTL efficiency is
The Prefix-SID sub-TLV carries IGP-Prefix-SID information. Figure 1-1177 shows the format
of the Prefix-SID sub-TLV.
SID/Index/L Variable This field contains either of the following information based
abel length on the V and L flags:
(variable) 4-byte label offset value, within an ID/label range. In this
case, V and L flags are not set.
3-byte local label: The rightmost 20 bits are a label value.
In this case, the V and L flags must be set.
Adj-SID Sub-TLV
An Adj-SID Sub-TLV is optional and carries IGP Adjacency SID information. Figure 1-1179
shows its format.
Weight 8 bits Weight. The Adj-SID weight is used for load balancing.
SID/Index/L Variable This field contains either of the following information based
abel length on the V and L flags:
(variable) 3-byte local label: The rightmost 20 bits are a label value.
In this case, the V and L flags must be set.
SID/Label Sub-TLV
A SID/Label Sub-TLV includes a SID or an MPLS label. The SID/Label Sub-TLV is a part of
the SR-Capabilities Sub-TLV and SR Local Block Sub-TLV.
Figure 1-1182 shows the format of the SID/Label Sub-TLV.
SID/Label Variable See SID/Label Sub-TLV. The SRGB start value is included.
Sub-TLV length When multiple SRGBs are configured, ensure that the SRGB
(variable) sequence is correct and the SRGBs do not overlap.
SR-Algorithm Sub-TLV
NEs use different algorithms, for example, the SPF algorithm and various SPF variant
algorithms, to compute paths to the other nodes or prefixes. The newly defined SR-Algorithm
Sub-TLV enables an NE to advertise its own algorithm. The SR-Algorithm Sub-TLV is also
carried in the IS-IS Router Capability TLV-242 for transfer. The SR-Algorithm Sub-TLV can
be propagated within the same IS-IS level.
Figure 1-1185 shows the format of the SR-Algorithm Sub-TLV.
The SRLB TLV advertised by the NE may contain a label range that is out of the SRLB. Such
a label range is assigned locally and is not advertised in the SRLB. For example, an adjacency
SID is assigned a local label, not a label within the SRLB range.
In Figure 1-1187, devices run IS-IS. Segment routing is used and enables each device to
advertise the SR capability and supported SRGB. In addition, the advertising end advertises a
prefix SID offset within the SRGB range. The receive end computes an effective label value
to generate a forwarding entry.
Devices A through F are deployed in areas of the same level. All Devices run IS-IS. An SR
tunnel originates from Device A and is terminated at Device D.
An SRGB is configured on Device D. A prefix SID is set on the loopback interface of Device
D. Device D encapsulates the SRGB and prefix SID into a link state protocol data unit (LSP)
(for example, IS-IS Router Capability TLV-242 containing SR-Capability Sub-TLV) and
floods the LSP across the network. After another Device receives the SRGB and prefix SID, it
uses them to compute a forwarding label, uses the IS-IS topology information, and runs the
Dijkstra algorithm to calculate an LSP and LSP forwarding entries.
An inter-IGP area SR LSP is created
In Figure 1-1188, to establish an inter-area SR LSP, the prefix SID must be advertised across
areas by penetrating these areas. This overcomes the restriction on IS-IS's flooding scope
within each area.
Devices A through D are deployed in different areas, and all devices run IS-IS. An SR tunnel
originates from Device A and is terminated at Device D.
An SRGB is configured on Device D. A prefix SID is set on the loopback interface of Device
D. Device D generates and delivers forwarding entries. It encapsulates the SRGB and prefix
SID into an LSP (for example, IS-IS Router Capability TLV-242 containing SR-Capability
Sub-TLV) and floods the LSP across the network. Upon receipt of the LSP, Device C parses
the LSP to obtain the prefix SID, calculates and delivers forwarding entries, and penetrates the
prefix SID and prefix address to the Level-2 area. Device B parses the LSP to obtain the
prefix SID, calculates and delivers forwarding entries, and penetrates the prefix SID and
prefix address to the Level-1 area. Device A parses the LSP and obtains the prefix SID, uses
IS-IS to collect topology information, and runs the Dijkstra algorithm to compute a label
switched path and tunnel forwarding entries.
1.13.2.3.4 SR-TE
SR-Traffic Engineering (SR-TE) is a new Multiprotocol Label Switching (MPLS) Traffic
Engineering (TE) tunneling technique implemented based on an Interior Gateway Protocol
(IGP) extension. The controller calculates a path for an SR-TE tunnel and forwards a
computed label stack to the ingress configured on a forwarder. The ingress uses the label stack
to generate an LSP in the SR-TE tunnel. Therefore, the label stack is used to control the path
along which packets are transmitted on a network.
SR-TE Advantages
SR-TE tunnels are capable of meeting the rapid development requirements of
software-defined networking (SDN), which Resource Reservation Protocol-TE (RSVP-TE)
tunnels are unable to meet.Table 1-366 describes the comparison between SR-TE and
RSVP-TE.
Related Concepts
Label Stack
A label stack is a set of Adjacency Segment labels in the form of a stack stored in a packet
header. Each Adjacency SID label in the stack identifies an adjacency to a local node, and the
label stack describes all adjacencies along an SR-TE LSP. In packet forwarding, a node
searches for an adjacency mapped to each Adjacency Segment label in a packet, removes the
label, and forwards the packet. After all labels are removed from the label stack, the packet is
sent out of an SR-TE tunnel.
Stick Label and Stick Node
If a label stack depth exceeds that supported by a forwarder, the label stack cannot carry all
adjacency labels on a whole LSP. In this situation, the controller assigns multiple label stacks
to the forwarder. The controller delivers a label stack to an appropriate node and assigns a
special label to associate label stacks to implement segment-based forwarding. The special
label is a stitching label, and the appropriate node is a stitching node.
The controller assigns a stitching label at the bottom of a label stack to a stitching node. After
a packet arrives at the stitching node, the stitching node swaps a label stack associated with
the stitching label based on the label-stack mapping. The stitching node forwards the packet
based on the label stack for the next segment.
IS-IS SR is enabled on PE1, PE2, and P1 through P4 to establish IS-IS neighbor relationships
between each pair of directly connected nodes. In SR-capable IS-IS instances, each outbound
IS-IS interface is assigned an SR Adjacency Segment label. SR IS-IS advertises the
Adjacency Segment labels across a network. P3 is used as an example. In Figure 1-1189,
IS-IS-based label allocation is as follows:
1. P3 runs IS-IS to apply for a local dynamic label for an adjacency. For example, P3
assigns adjacency label 9002 to the P3-to-P4 adjacency.
2. P3 runs IS-IS to advertise the adjacency label and flood it across the network.
3. P3 uses the label to generate a label forwarding table.
4. After the other nodes on the network run IS-IS to learn the Adjacency Segment label
advertised by P3, the nodes do not generate local forwarding tables.
PE1, P1, P2, P3, and P4 assign and advertise adjacency labels in the same way as P3 does.
The label forwarding table is then generated on each node. Each node establishes an IS-IS
neighbor relationship with the controller, generates topology information, including SR labels,
and reports topology information to the controller. A node establishes an IS-IS neighbor
relationship with the controller, generates topology information, including SR labels, and
reports topology information to the controller.
SR-TE Tunnel
Segment Routing Traffic Engineering (SR-TE) runs the SR protocol and uses TE constraints
to create a tunnel.
In Figure 1-1190, a primary LSP is established along the path PE1->P1->P2->PE2, and a
backup path is established along the path PE1->P3->P4->PE2. The two LSPs have the same
tunnel ID of an SR-TE tunnel. The LSP originates from the ingress, passes through transit
nodes, and is terminated at the egress.
SR-TE tunnel establishment involves configuring and establishing an SR-TE tunnel. Before
an SR-TE tunnel is created, IS-IS neighbor relationships must be established between
forwarders to implement network layer connectivity, to assign labels, and to collect network
topology information. Forwarders send label and network topology information to the
controller, and the controller uses the information to calculate paths.
Figure 1-1191 Networking for SR-TE tunnels established using configurations that a controller
runs NETCONF to deliver to a forwarder
1. The controller uses SR-TE tunnel constraints and Path Computation Element (PCE) to
calculate paths and combines adjacency labels into a label stack that is the calculation
result.
If the label stack depth exceeds the upper limit supported by a forwarder, the label stack
can only carry some labels, and the controller needs to divide a label stack into multiple
stacks for an entire path.
In Figure 1-1191, the controller calculates a path PE1->P3->P1->P2->P4->PE2 for an
SR-TE tunnel. The path is mapped to two label stacks {1003, 1006, 100} and {1005,
1009, 1010}. Label 100 is a stitching label, and the others are adjacency labels.
2. The controller runs NETCONF to deliver the label stacks to the forwarder.
In Figure 1-1191, the process of delivering label stacks on the controller is as follows:
a. The controller delivers label stack {1005, 1009, 1010} to P1 and assigns a stitching
label of value 100 associated with the label stack. Label 100 is the bottom label in
the label stack on PE1.
b. The controller delivers label stack {1003, 1006, 100} to the ingress PE1.
3. The forwarder uses the delivered label stacks to establish an LSP for an SR-TE tunnel.
An SR-TE tunnel does not support MTU negotiation. Therefore, the MTUs configured on nodes along
the SR-TE tunnel must be the same. If an SR-TE tunnel is created manually, set an MTU value on the
tunnel interface or use the default MTU of 1500 bytes. On the manual SR-TE tunnel, the smallest value
in the following values takes effect: MTU of the tunnel, MPLS MTU of the tunnel, MTU of the
outbound interface, and MPLS MTU of the outbound interface.
In Figure 1-1192, the SR-TE path calculated by the controller is A -> B -> C -> D -> F -> E.
The path is mapped to two label stacks {1003, 1006, 100} and {1005, 1009, 1010}. The two
label stacks are delivered to ingress A and stitching node C, respectively. Label 100 is a
stitching label and is associated with label stack {1005, 1009, 1010}. The other labels are
adjacency labels. Process of forwarding data packets along an SR-TE tunnel is shown as
following:
1. The ingress A adds a label stack of {1003, 1006, 100}. The ingress A uses the outer label
of 1003 in the label stack to match against an adjacency and finds A-B adjacency as an
outbound interface. The ingress A strips off label 1003 from the label stack {1003, 1006,
100} and forwards the packet downstream through A-B outbound interface.
2. Node B uses the outer label of 1006 in the label stack to match against an adjacency and
finds B-C adjacency as an outbound interface. Node B strips off label 1006 from the
label stack {1006, 100}. The pack carrying the label stack {100} travels through the
B-to-C adjacency to the downstream node C.
3. After stitching node C receives the packet, it identifies stitching label 100 by querying
the stitching label entries, swaps the label for the associated label stack {1005, 1009,
1010}. Stitching node C uses the top label 1005 to search for an outbound interface
connected to the C-to-D adjacency and removes label 1005. Stitching node C forwards
the packet carrying the label stack {1009, 1010} along the C-to-D adjacency to the
downstream node D. For more details about stick label and stick node, see 1.13.2.3.4
SR-TE.
4. After nodes D and E receive the packet, they treat the packet in the same way as node B.
Node E removes the last label 1010 and forwards the data packet to node F.
5. Egress F receives the packet without a label and forwards the packet along a route that is
found in a routing table.
The preceding information shows that after adjacency labels are manually specified, devices
strictly forward the data packets hop by hop along the explicit path designated in the label
stack. This forwarding method is also called strict explicit-path SR-TE.
On the network shown in Figure 1-1193, a node+adjacency mixed label stack is configured.
On the ingress node A, the mixed label stack is {1003, 1006, 1005, 101}. Labels 1003, 1006
and 1005 are adjacency labels, and label 101 is a node label.
1. Node A finds an A-B outbound interface based on label 1003 on the top of the label
stack. Node A removes label 1003 and forwards the packet to the next hop node B.
2. Similar to node A, node B finds the outbound interface mapped to label 1006 on the top
of the label stack. Node B removes label 1006 and forwards the packet to the next hop
node C.
3. Similar to node A, node C finds the outbound interface mapped to label 1005 on the top
of the label stack. Node C removes label 1006 and forwards the packet to the next hop
node D.
4. Node D processes label 101 on the top of the label stack. This label is to perform load
balancing. Node D replaces this label with labels 201 and 301 and forwards the packet to
nodes E and G. Traffic packets are balanced on links based on 5-tuple information.
5. After receiving node labels 201 and 301, nodes E and G that are at the penultimate hops
removes labels and forwards packets to node F to complete the E2E traffic forwarding.
The preceding information shows that after adjacency and node labels are manually specified,
a device can forward the data packets along the shortest path or load-balance the data packets
over paths. The paths are not fixed, and therefore, this forwarding method is called loose
explicit-path SR-TE.
Static Route
Static routes on an SR-TE tunnel work in the same way as common static routes. When
configuring a static route, set the outbound interface of a static route to an SR-TE tunnel
interface so that traffic transmitted over the route is directed to the SR-TE tunnel.
Tunnel Policy
By default, VPN traffic is forwarded through LDP LSPs, not SR LSPs or SR-TE tunnels. If
the default LDP LSPs cannot meet VPN traffic requirement, a tunnel policy is used to direct
VPN traffic to an SR LSP or an SR-TE tunnel.
The tunnel policy may be a tunnel type prioritizing policy or a tunnel binding policy. Select
either of the following policies as needed:
Select-seq mode: This policy changes the type of tunnel selected for VPN traffic. An SR
LSP or SR-TE tunnel is selected as a public tunnel for VPN traffic based on the
prioritized tunnel types. If no LDP LSPs are available, SR LSPs are selected by default.
Tunnel binding mode: This policy defines a specific destination IP address, and this
address is bound to an SR-TE tunnel for VPN traffic to guarantee QoS.
Auto Route
An IGP uses an auto route related to an SR-TE tunnel that functions as a logical link to
compute a path. The tunnel interface is used as an outbound interface in the auto route.
According to the network plan, a node determines whether an LSP link is advertised to a
neighbor node for packet forwarding. An auto route is configured using either of the following
methods:
Forwarding shortcut: The node does not advertise an SR-TE tunnel to its neighbor nodes.
The SR-TE tunnel can be involved only in local route calculation, but cannot be used by
the other nodes.
Forwarding adjacency: The node advertises an SR-TE tunnel to its neighbor nodes. The
SR-TE tunnel is involved in global route calculation and can be used by the other nodes.
Forwarding shortcut and forwarding adjacency are mutually exclusive, and cannot be used
simultaneously.
When the forwarding adjacency is used, a reverse tunnel must be configured for a routing protocol
to perform bidirectional check after a node advertises LSP links to the other nodes. The forwarding
adjacency must be enabled for both tunnels in opposite directions.
Policy-Based Routing
The policy-based routing (PBR) allows a device to select routes based on user-defined
policies, which improves traffic security and balances traffic. If PBR is enabled on an SR
network, IP packets are forwarded over specific LSPs based on PBR rules.
SR-TE PBR, the same as IP unicast PBR, is implemented by defining a set of matching rules
and behaviors. The rules and behaviors are defined using the apply clause with an SR-TE
tunnel interface used as an outbound interface. If packets do not match PBR rules, they are
properly forwarded using IP; if they match PBR rules, they are forwarded over specific
tunnels.
The network shown in Figure 1-1196 consists of inconsecutive L3VPN subnets with a
backbone network in between. PEs establish an SR LSP to forward L3VPN packets. PEs run
BGP to learn VPN routes. The deployment is as follows:
An IS-IS neighbor relationship is established between each pair of directly connected
devices on the public network to implement route reachability.
A BGP peer relationship is established between the two PEs to learn peer VPN routes of
each other.
The PEs establish an IS-IS SR LSP to assign public network labels and compute a label
switched path.
BGP is used to assign a private network label, for example, label Z, to a VPN instance.
VPN routes are iterated to the SR LSP.
PE1 receives an IP packet, adds the private network label and SR public network label to
the packet, and forwards the packet along the label switched path.
HVPN
On a growing network with increasing types of services, PEs encounter scalability problems,
such as insufficient access or routing capabilities, which reduces network performance and
scalability. In this situation, VPNs cannot be deployed in a large scale. In Figure 2, on a
hierarchical VPN (HVPN), PEs play different roles and provide various functions. These PEs
form a hierarchical architecture to provide functions that are provided by one PE on a
non-hierarchical VPN. HVPNs lower the performance requirements for PEs.
inner label L4 and an outer label Lv to the packet and sends the packet to the SPE over
the corresponding LSP. The label stack is L4/Lv.
3. After receiving the packet, the SPE replaces the outer label Lv with Lu and the inner
label L2 with L3. Then, the SPE sends the packet to the NPE over the same LSP.
4. After receiving the packet, the NPE removes the outer label Lu, searches for a VPN
instance corresponding to the packet based on the inner label L3, and removes the inner
label L3 after the VPN instance is found. Then, the NPE searches the VPN forwarding
table of this VPN instance for the outbound interface of the packet based on the
destination address of the packet. The NPE sends the packet through this outbound
interface to CE2. The packet sent by the NPE is a pure IP packet with no label.
VPN FRR
In Figure 1-1198, PE1 adds the optimal route advertised by PE3 and less optimal route
advertised by PE4 into a forwarding entry. The optimal route is used to guide traffic
forwarding, and the less optimal route is used as a backup route.
P1-to-P3 link failure PE1 does not support BFD for SR-BE and
cannot detect an LSP Down event. As a
result, PE2 cannot perform VPN FRR
switching to switch traffic to PE4 along
LSP3 over a path in Figure 1-1198.
After IS-IS FRR is configured, P1 performs
FRR switching to switch traffic to LSP2
over the path PE1->P1->P2->P4->P3->PE3,
shown in Figure 1-1198.
After IS-IS FRR is configured, SR-BE LSP
hard convergence is performed on the P
node. Traffic switches to LSP2 over the
In Figure 1-1200, after the PEs learn the MAC addresses of VPN sites and establish a public
network SR LSP, the PEs can transmit unicast packets to the other site. The packet
transmission process is as follows:
1. CE1 sends unicast packets based on Layer 2 forwarding to PE1.
2. After PE1 receives the packets, PE1 encapsulates a private network label carried in a
MAC entry and a public network SR label in sequence and sends the packets to PE2.
3. After PE2 receives the encapsulated unicast packets, PE1 performs decapsulation,
removes the private network label, and searches the private network MAC table for a
matching outbound interface.
the situation when the label stack is withdrawn. Therefore, BFD must be used to monitor