Sie sind auf Seite 1von 898

HCIE-R&S

Huawei Certification

HCIE-R&S
Huawei Certified Internetwork Expert-Routing and Switching

Huawei Technologies Co.,Ltd

HUAWEI TECHNOLOGIES

HCIE

Copyright Huawei Technologies Co., Ltd. 2010. All rights reserved.

No part of this document may be reproduced or transmitted in any form or by any


means without prior written consent of Huawei Technologies Co., Ltd.
Trademarks and Permissions

and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd.
All other trademarks and trade names mentioned in this document are the property of
their respective holders.
Notice

The information in this document is subject to change without notice. Every effort
has been made in the preparation of this document to ensure accuracy of the contents,
but all statements, information, and recommendations in this document do not
constitute the warranty of any kind, expressed or implied.

Huawei Certification
HCIE-R&S

HUAWEI TECHNOLOGIES

HCIE-R&S

Huawei Certification System


Relying on its strong technical and professional training system, in accordance with
different customers at different levels of ICT technology, Huawei certification is
committed to provide customs with authentic, professional certification.
Based on characteristics of ICT technologies and customersneeds at different levels,
Huawei certification provides customers with certification system of four levels.
HCDA (Huawei Certification Datacom Associate) is primary for IP network
maintenance engineers, and any others who want to build an understanding of the IP
network. HCDA certification covers the TCP/IP basics, routing, switching and other
common foundational knowledge of IP networks, together with Huawei
communications products, versatile routing platform VRP characteristics and basic
maintenance.
HCDP-Enterprise (Huawei Certification Datacom Professional-Enterprise) is aimed at
enterprise-class network maintenance engineers, network design engineers, and any
others who want to grasp in depth routing, switching, network adjustment and
optimization technologies. HCDP-Enterprise consists of IESN (Implement Enterprise
Switch Network), IERN (Implement Enterprise Routing Network), and IENP
(Improving Enterprise Network performance), which includes advanced IPv4 routing
and switching technology principles, network security, high availability and QoS, as
well as the configuration of Huawei products.
HCIE-Enterprise (Huawei Certified Internetwork Expert-Enterprise) is designed to
endue engineers with a variety of IP technologies and proficiency in the maintenance,
diagnostics and troubleshooting of Huawei products, which equips engineers with
competence in planning, design and optimization of large-scale IP networks.

HUAWEI TECHNOLOGIES

HCIE

HUAWEI TECHNOLOGIES

HCIE-R&S

Referenced icon

Router

L3 Switch

L2 Switch

Firewall

Serial line

Ethernet line

HUAWEI TECHNOLOGIES

Net cloud

HCIE

CONTENTS
RIP ..................................................................................................................................................... 7
IS-IS.................................................................................................................................................. 59
OSPF .............................................................................................................................................. 123
BGP BASICS .................................................................................................................................... 196
BGP ADVANCED AND INTERNET DESIGN ........................................................................................ 266
ROUTE IMPORT AND CONTROL ...................................................................................................... 334
VLAN .............................................................................................................................................. 393
LAN LAYER 2 TECHNOLOGIES ......................................................................................................... 448
WAN LAYER 2 TECHNOLOGIES........................................................................................................ 496
STP ................................................................................................................................................. 548
MULTICAST .................................................................................................................................... 636
IPv6 ................................................................................................................................................ 719
MPLS VPN ...................................................................................................................................... 805
OTHER TECHNOLOGIES .................................................................................................................. 841

HUAWEI TECHNOLOGIES

RIPv1 packet format


A RIP packet consists of two parts: Header and Route Entries.
The Header includes the Command and Version fields. Route
Entries include at most 25 routing entries. Each routing entry
contains the Address Family Identifier field, IP Address of
target network, and Metric field.
The meaning of each field in a RIP packet is as follows:
Command: indicates whether the packet is a request or response.
The value is 1 or 2. The value 1 indicates a request, and the value 2
indicates a response.
Version: specifies the used RIP version. The value 1 indicates a
RIPv1 packet, and the value 2 indicates a RIPv2 packet.
Address Family Identifier: specifies the used address family. The
value is 2 for IPv4. If the packet is a request for the entire routing
table, the value is 0.
IP Address: specifies the destination address for the routing entry.
The value can be a network address or host address.
Metric: indicates how many hops the packet has passed
through to the destination. Although the field value ranges from 0 to
2^32 (2 to the power of 32), the value ranges from 1 to 16 in RIP.

RIPv1 characteristics
RIP is a UDP-based routing protocol. A RIP packet excluding
an IP header has at most 512 bytes, which includes a 4-byte
RIP header, and each route includes a 20-byte, the maxium
message of RIP is 4+(25*20)=504-byte routing entries, and an
8-byte UDP header. A RIPv1 packet does not carry mask
information. RIPv1 send and receive routes based on the main
class network segment mask and interface address mask.
Therefore, RIPv1 does not support route summarization or
discontinuous subnets. RIPv1 packets do not carry the
authentication field, and so RIPv1 does not support
authentication.

RIPv2 packet format


A RIPv2 packet has the same format as a RIPv1 packet
except that RIPv2 uses some new and unused fields in RIPv1
to provide extended functions.
The meaning of the new fields is as follows:
Route Tag: indicates external routes learned from other
protocols or routes imported into RIPv2.
Subnet Mask: identifies the subnet mask of an IPv4 address.
Next Hop: indicates a next-hop address that is better than
the advertising router address. The value 0.0.0.0
indicates that the advertising router address is the
optimal next-hop address.
When authentication is configured in RIPv2, RIPv2 modifies
the first Route Entries:
Changes the Address Family Identifier field to 0XFFFF.
Changes the Route Tag field to the Authentication Type field.
Changes the IP Address, Subnet Mask, Next Hop, and
Metric fields to the Password field.
Compared with RIPv1, RIPv2 has the following advantages:
Supports route tags. Route tags are used in routing policies to
flexibly control routes. Tags can also be used when RIP
processes import routes from each other.
Supports subnet masks, route summarization, and CIDR.

Supports specified next hops to select the optimal next-hop


address on a broadcast network.
Multicasts route updates. Only RIPv2-running devices can
receive protocol packets, reducing resource consumption.
Supports packet authentication to enhance security.
On a broadcast network with more than two devices, the Next Hop field
changes to optimize the path.

In MD5 authentication, the AND operation is performed on route entries


and shared key. A router then sends the AND operation results and
route entries to the neighbor.

RI mainly uses three timers:


Update timer: defines the interval between two route updates.
It periodically triggers the transmission of route updates at a
default interval of 30 seconds.
Aging timer: specifies the aging time of routes. If a RIP device
does not receive the update of a route from its neighbor within
the aging time, the RIP device considers the route as
unreachable. After the aging timer expires, the RIP device sets
the metric of the route to 16.
Garbage-collect timer: specifies the interval between a route is
marked as unreachable and the route is deleted from the
routing table. The default interval is four times the update
interval, namely, 120 seconds. If the RIP device does not
receive the update of an unreachable route from the same
neighbor within the garbage-collect time (defaults to 120
seconds), the RIP device deletes the route from the routing
table.
Relationship between three timers:
RIP route update advertisement is controlled by the update
timer. A route update is sent at a default interval of 30 seconds.
Each routing entry has two timers: aging timer and garbagecollect timer. When a route is learned and added to the routing
table, the aging timer starts. If a RIP device does not receive
the update of the route from a neighbor when the aging timer
expires, the device sets the metric of the route to 16
(indicatingan unreachable route) and starts the garbage-collect
timer.

If the device still does not receive the update of the


unreachable route from the neighbor when the garbage-collect
timer expires, the device deletes the route from the routing
table.

Precautions
If a RIP device does not have the triggered update function, it
deletes an unreachable route from the routing table after a
maximum of 300 seconds (aging time plus garbage-collect
time).
If a RIP device has the triggered update function, it deletes an
unreachable route from the routing table after a maximum of
120 seconds (the garbage-collect time).

Split horizon
RIP uses split horizon to reduce bandwidth consumption and
prevent routing loops.
Implementation
R1 sends R2 a route to network 10.0.0.0/8. If split horizon is
not configured, R2 sends the route learned from R1 back to R1.
In this manner, R1 can learn two routes to network 10.0.0.0/8:
one direct route with zero hops and the other route with two
hops and R2 as the next hop.
However, only the direct route is active in the RIP routing table
of R1. When the route from R1 to network 10.0.0.0/8 becomes
unreachable and R2 does not receive route unreachable
information, R2 continues sending route information indicating
that network 10.0.0.0/8 is reachable to R1. Subsequently, R1
receives incorrect route information and considers that it can
reach network 10.0.0.0/8 through R2; R2 still considers that it
can reach network 10.0.0.0/8 through R1. As a result, a routing
loop occurs. After split horizon is configured, R2 does not send
the route to network 10.0.0.0/8 back to R1, preventing a
routing loop.
Precautions
Split horizon is disabled on NBMA networks by default.

Poison reverse function


Poison reverse helps delete useless routes from the routing
table of the peer end.
Implementation
After receiving a route 10.0.0.0/8 from R1, R2 sets the metric
of the route to 16, indicating that the route is unreachable, if
poison reverse is configured. Then R1 does not use the route
10.0.0.0/8 learned from R2, preventing a routing loop.
Precautions
Poison reverse is disabled by default. Generally, split horizon
is enabled on Huawei devices (except on NBMA networks)
and poison reverse is disabled.
Comparisons between split horizon and poison reverse
Both split horizon and poison reverse can prevent routing
loops in RIP. The difference between them is as follows: Split
horizon avoids advertising a route back to neighbors along the
same path to prevent routing loops, while poison reverse
marks a route as unreachable and advertises the route back to
neighbors along the same path to prevent routing loops.

Triggered update
Triggered update can shorten the network convergence time.
When a routing entry changes, a RIP device broadcasts the
change to other devices immediately without waiting for
periodic update. If triggered update is not configured, by
default, invalid routes are retained in the routing table for a
maximum of 300 seconds (aging time plus garbage-collect
time).
Update is not triggered when the next-hop address becomes
unreachable.
Implementation
After R1 detects a network fault, it sends a route update to R2
immediately without waiting for the expiry of the update timer.
Subsequently, the routing table of R2 is updated in a timely
manner.

Route summarization
RIPv2 supports route summarization. Because RIPv2 packets
carry the mask, RIPv2 supports subnetting. Route
summarization can improve scalability and efficiency of large
networks and reduce the routing table size.
RIPv2 process-based classful summarization can implement
automatic summarization.
Interface-based summarization can implement manual
summarization.
If the routes to be summarized carry tags, the tags are deleted
after these routes are summarized into one summary route.
Case
Two routes: route 10.1.0.0/16 (metric=10) and route
10.2.0.0/16 (metric=2) are summarized into one natural
network segment route 10.0.0.0/8 (metric=3).

Working process analysis:


Initial state: A router starts a RIP process, associates an
interface with the RIP process, and sends as well as receives
RIP packets from the interface.
Build a routing table: The router builds its routing entries
according to received RIP packets.
Maintain the routing table: The router sends and receive a
route update at an interval of 30 seconds to maintain its
routing entries.
Age routing entries: The router starts a 180-second timer for its
routing entries. If the router receives route updates within 180
seconds, it resets the update timer and aging timer.
Garbage collect entries: If the router does not receive the
update of a route after 180 seconds, it starts the 120-second
garbage-collect timer and sets the metric of the route to 16.
Delete routing entries: If the router still does not receive the
update of the route after 120 seconds, it deletes the route from
the routing table.

Case description
In this case, R1, R2, and R3 reside on network 192.168.1.0/24;
R3, R4, and R5 reside on network 192.168.2.0/24. All the
routers run RIPv2 and advertise IP addresses of connected
interfaces. To control route selection on R3, modify the metric
of routes.
Remarks
In the IP routing table, only some related routing entries are
displayed. In the Flags field of the route, R indicates an
iterated route, and D indicates that the route is delivered to the
FIB table.
The route iteration process is as follows: Iteration process is
finding routing for iteration. On a device, when the next hop of
a route to the destination address does not match the
outbound interface of the device, routes can match again the
destination address in the table of the next hop so routes be
iterated to find the correct outbound interface for forwarding.
The FIB table is the route forwarding table that is generated by
the routing table. You can run the display fib command to
view the forwarding table.

Command usage
The rip metricin command increases the metric of a received
route. After the route is added to the routing table, the metric of
the route is changed. Running this command affects route
selection of the local device and other devices.
The rip metricout command increases the metric of an
advertised route. The metric of the route remains unchanged
in the routing table. Running this command does not affect
route selection of the local device but affects route selection of
other devices.
View
Interface view
Parameters
rip metricout { value | { acl-number | acl-name acl-name | ipprefix ip-prefix-name } value1 }: sets the additional metric to
be added to an advertised route.
value: increases the metric of an advertised route. The
value ranges from 1 to 15 and defaults to 1.
acl-number: specifies a basic ACL number. The value
ranges from 2000 to 2999.
acl-name acl-name: specifies an ACL name. The value
is case-sensitive.
ip-prefix ip-prefix-name: specifies an IP prefix list name,
which must be unique.

value1: increases the metric of the route that passes the


filtering of an ACL or IP prefix list.
Precautions
You can specify value1 to increase the metric of the advertised
RIP route that passes the filtering of an ACL or IP prefix list. If
a RIP route does not pass the filtering, its metric is increased
by 1.
Running the rip metricin/metricout commands will affect
route selection of other devices.

Case description
The topology in this case is the same as that in the previous
case. To prevent interfaces from sending or receiving route
updates, suppress the interfaces or run the undo rip
input/output commands.

Command usage
The silent-interface command suppresses an interface to
allow it to receive but not send RIP packets. If an interface is
suppressed, direct routes of the network segment where the
interface resides can still be advertised to other interfaces.
This command can be used together with the peer (RIP)
command to advertise routes to a specified device.
The undo rip output/input command prohibits an interface
from sending/receiving RIP packets.
View
silent-interface: RIP view
undo rip output/input: interface view
Parameters
silent-interface { all | interface-type interface-number }
all: suppresses all the interfaces.
Precautions
After all the interfaces are suppressed, one of the interfaces
cannot be activated. That is, the silent-interface all command
has the highest priority. In this case, all the interfaces of R4
are suppressed, and so any interface of R4 cannot be
activated.

Configuration verification
The display ip routing-table command output shows that: R3
can receive the update of route 172.16.0.0/24 from R5 but not
R4 and can receive the update of route 10.0.0.0/24 from R1
but not R2.

Case description
The topology in this case is the same as that in the previous
case. To prevent a device from receiving routes from a
specified neighbor, run the filter-policy gateway command.

Command usage
The filter-policy { acl-number | acl-name acl-name } import
command filters received routes based on an ACL.
The filter-policy gateway ip-prefix-name import command
filters routes based on the advertising gateway.
View
filter-policy { acl-number | acl-name acl-name | ip-prefix ipprefix-name } import: RIP view
filter-policy gateway ip-prefix-name import: RIP view
Parameters
filter-policy { acl-number | acl-name acl-name } import
acl-number: specifies the number of a basic ACL used to
filter the destination address of routes.
acl-name acl-name: specifies the name of an ACL. The
name is case-sensitive and must start with a letter.
ip-prefix: filters routes based on an IP prefix list.
ip-prefix-name: specifies the name of an IP prefix list
used to filter the destination address of routes.
filter-policy gateway ip-prefix-name import
gateway: filters routes based on the advertising gateway.
ip-prefix-name: specifies the IP prefix list name of the
advertising gateway.

Configuration verification
Run the filter-policy gateway command to filter routes from a
specified neighbor. In this case, routes from R4 are filtered on
R3.

Case description
To reduce routing entries, Company A decides to summarize
routes. RIPv2 summarization includes automatic
summarization based on the main class network and manual
summarization. You can perform automatic summarization on
R1 and manual summarization on R3 and R4.

Command usage
summary [ always ]: When the class summarization is enable,
summary routes are advertised to the natural network
boundary. In default the RIPv2 is enable. But If split horizon or
poison reverse is configured, summarization will become
invalid. And when the always parameter is configured, no
matter how the split horizon or poison reverse situation is,
RIPv2 automatic summarization is enable.
rip summary-address ip-address mask [ avoid-feedback ]:
configures a RIP router to advertise a local summary IP
address. If the avoid-feedback keyword is configured, the
local interface does not learn the summary route to the
advertised summary IP address. This configuration prevents
routing loops.

View
summary [ always ]: RIP view
rip summary-address ip-address mask [ avoid-feedback ]:
interface view
Parameters
summary [ always ]
always: If the always parameter is not configured,
classful summarization becomes ineffective when split
horizonor poison reverse is configured.

Therefore, when summary routes are advertised to the natural


network boundary with no always, split horizon or poison
reverse must be disabled in corresponding views.
rip summary-address ip-address mask [ avoid-feedback ]
ip-address: specifies a summary IP address.
mask: specifies a network mask.
avoid-feedback: avoids learning the summary route to
the advertised summary IP address from the interface.

Case description
In this case, R1 and R2 connect over network 192.168.1.0/24.
R1 connects to network 10.0.0.0/24, and R2 connects to
network 172.16.0.0/24. Devices on the network run RIPv2 and
import the routes to networks where the devices reside. Only
the display command output of R1 is provided and only
information about this case is displayed.

Command usage
timers rip update age garbage-collect: adjusts a timer.
rip authentication-mode md5 nonstandard passwordkey key-id: configures the MD5 authentication mode.
Authentication packets use the nonstandard packet format.
nonstandard indicates that MD5 authentication packets use
the nonstandard packet format (IETF standards).
rip replay-protect [ window-range ]: enables the replayprotect function. window-range specifies the receive or
transmit buffer size for connections. The default value is 50.
View
timers rip update age garbage-collect: RIP view
rip authentication-mode md5 nonstandard passwordkey key-id: interface view
rip replay-protect [ window-range ]: interface view
Parameters
timers rip update age garbage-collect
update: specifies the interval for transmitting route
updates.
age: specifies the route aging time.
garbage-collect: specifies the interval at which an
unreachable route is deleted from the routing table, namely,
garbage-collect time defined in standards.

Precautions
If the three timers are configured incorrectly, routes become
unstable. The update time must be shorter than the aging time.
For example, if the update time is longer than the aging time, a
RIP router cannot notify route updates to neighbors within the
update time. In applications, the timeout period of the garbagecollect timer is not fixed. When the update timer is set to 30
seconds, the garbage-collect timer may range from 90 to 120
seconds. The reason is as follows: Before the RIP router
deletes an unreachable route from the routing table, it sends
Update packets four times to advertise the route and sets the
metric of the route is set to 16. Subsequently, all the neighbors
learn that the route is unreachable. Because a route may
become unreachable anytime within an update period, the
garbage-collect timer is 3 to 4 times the update timer.
Assume that the Identification field (a field in an IP header) of
the last RIP packet sent before a RIP interface goes Down is X.
After the interface becomes Up, the Identification file of the
RIP packet sent again becomes 0, and subsequent RIP
packets are discarded until a RIP packet with the Identification
field as X+1 is received. This, however, causes asynchronous
and lost RIP routing information between two ends. To
address this issue, configure the rip replay-protect command
to enable the RIP interface to obtain the Identification field of
the last RIP packet sent before the RIP interface goes Down
and increase the Identification field in the subsequent RIP
packet by 1.

1. Check whether ARP is working properly.


2. Check whether related interfaces are Up.
3. Check whether RIP is enabled on the interfaces. Run the display
current-configuration configuration rip command to view
information about the RIP-enabled network segment. Check
whether the interfaces reside on the network segment. The network
address specified in the network command must be a natural
network address.
4. Check whether versions of the RIP packets sent by the peer end
and received by the local end match. By default, an interface sends
only RIPv1 packets but can receive RIPv1 and RIPv2 packets.
When an inbound interface receives RIP packets of a different
version, RIP routes may fail to be correctly received.
5. Check whether a routing policy is configured to filter received RIP
routes. If so, modify the routing policy.
6. Check whether UDP port 520 is disabled.
7. Check whether the undo rip input/output commands are
configured on the interfaces or whether a high metric is configured
using the rip metricin command.
8. Check whether the interfaces are suppressed.
9. Check whether the route metric is larger than 16.
10. Check whether the interface authentication modes on two ends
match. If packet authentication fails, correctly configure interface
authentication modes.

1. Check whether RIP is enabled on the interfaces. Run the display


current-configuration configuration rip command to view
information about the RIP-enabled network segment. Check
whether the interfaces reside on the network segment. The network
address specified in the network command must be a natural
network address.
2. Check whether versions of the RIP packets sent by the peer end
and received by the local end match. By default, an interface sends
only RIPv1 packets but can receive RIPv1 and RIPv2 packets.
When an inbound interface receives RIP packets of a different
version, RIP routes may fail to be correctly received.
3. Check whether a routing policy is configured to filter received RIP
routes. If so, modify the routing policy.
4. Check whether UDP port 520 is disabled.
5. Check whether the undo rip input/output commands are
configured on the interfaces or whether a high metric is configured
using the rip metricin command.
6. Check whether the interfaces are suppressed.
7. Check whether the route metric is larger than 16.
8. Check whether the interface authentication modes on two ends
match. If packet authentication fails, correctly configure interface
authentication modes.

Case description
In this case, R1 connects to R2 through a frame relay network.
R1 connects to network 10.X.X.0/24, and R2 connects to
network 172.16.X.0/24.

Analysis process
In the pre-configurations of R1 and R2, the frame relay
configuration supports multicast.
R1 sends version 2 Update packets to R2 in multicast.
R1 and R2 can learn routes to each other.

Results
Generally, the peer command makes the routers send the
packets in unicast, but not surpress to sent packets in
multicast by default. Therefore, suggest configure the related
interfaces are silent mode when configure this command. So,
the multicast packets is supress and only unicast packets can
be sent.

Results
The display rip route command displays the RIP routes
learned from other routers and values of timers for routes. The
Tag field indicates whether a RIP route is an internal or
external route. The default value is 0. The Flags field indicates
whether a RIP route is active or inactive. The value RA
indicates an active RIP route, and the value RG indicates an
inactive RIP route and that the garbage-collect timer has been
started.

Results
After the avoid-feedback keyword is specified, the local
interface does not learn the summary route to the advertised
summary IP address, preventing routing loops.
The filter-policy export command configures a filtering policy
to filter the routes to be advertised. Only the filtered routes can
be added to the routing table and advertised through Update
packets.

Case description
In this topology, R1, R2, and R3 connect to the same
broadcast domain. R3 connects to network 172.16.X.0/24 and
advertises routes to RIP.

Analysis process
In requirements 1 and 3, R1 is taken as an example. The
command output shows that R1 sends multicast packets and
does not start authentication.
Before meeting requirement 2, R1 can receive all routes to
172.16.X.0/24.

Results
RIP authentication command can only be configured on an
interface. Huawei devices support standard MD5
authentication and Huawei proprietary authentication mode.
You can run the display rip process-id interface interfacetype verbose command to view the authentication mode.
Parameters
rip authenticationmode { simple password | md5 { nonstandard { passwordkey1 key-id | keychain keychain-name } | usual passwordkey2 } }
simple: indicates plain-text authentication.
password: Specifies the plain-text authentication password.
md5: indicates MD5 cipher-text authentication.
nonstandard: indicates that MD5 cipher-text
authentication packets use the nonstandard packet
format (IETF standards)
password-key1: specifies the authentication password in
cipher text.
key-id: specifies the key in MD5 cipher-text authentication.
keychain keychain-name: specifies a keychain name.
usual: indicates that MD5 cipher-text authentication
packets use the universal packet format (namely,
private standards).

password-key2: indicates the cipher-text authentication


keyword.
Precautions
Only one authentication password is used for each
authentication. If multiple authentication passwords are
configured, only the latest one takes effect. The authentication
password does not contain spaces.

Results
Only an ACL can be used but an IP prefix list cannot be used,
When defined ACLs make sure use the wild-mask. In this case,
need focus on the bits of wild-mask is 0, and the other bits is 1.

Results
RIPv2 multicasts Update packets by default. You can run the
rip version 2 broadcast command in the interface view to
configure RIPv2 to broadcast Update packets.

IS-IS Overview
IS-IS is a dynamic routing protocol designed by the
International Organization for Standardization (ISO) for its
Connectionless Network Protocol (CLNP).
The Internet Engineering Task Force (IETF) extended and
modified IS-IS so that IS-IS can be applied to TCP/IP and
OSI environments. This version of IS-IS is called Integrated
IS-IS.
IS-IS Terms
Connectionless network service (CLNS)
CLNS consists of the following three protocols:

CLNP: is similar to the Internet Protocol (IP) of TCP/IP.

IS-IS: is a routing protocol between intermediate systems,


that is, a protocol between routers.

ES-IS: End System to Intermediate System ,is similar to


Address Resolution Protocol (ARP) and Internet Control
Message Protocol
(ICMP) of IP.
NSAP: The open systems interconnection (OSI) uses
NSAP(Network Service Access Point) to search for various
services at the transport layer on OSI networks. An NSAP
is similar to an IP address.
Note for Integrated IS-IS
Integrated IS-IS applies to TCP/IP and OSI environments.
Unless otherwise specified, the IS-IS protocol in this
material refers to Integrated IS-IS.

Overall IS-IS Topology


To support large-scale routing networks, IS-IS adopts a
two-level hierarchy consisting of a backbone area and nonbackbone areas in an autonomous system (AS). Generally,
Level-1 routers are deployed in non-backbone areas,
whereas Level-2 and Level-1-2 routers are deployed in the
backbone area. Each non-backbone area connects to the
backbone area through a Level-1-2 router.
Topology Introduction
The figure shows a network that runs IS-IS. The network
topology is similar to the multi-area topology of an OSPF
network. The backbone area contains all routers in area
47.0001 and Level-1-2 routers in other areas.
In addition, Level-2 routers can be in different areas.
Differences between IS-IS and OSPF of topology are as follows:
In OSPF, a link can belongs to only one area.In IS-IS, a link
can belong to different areas.
In IS-IS, no area is physically defined as the backbone or
non-backbone area. In OSPF, Area 0 is defined as the
backbone area.
In IS-IS, Level-1 and Level-2 routers use the shortest path
first (SPF) algorithm to generate shortest path trees (SPTs)
respectively. In OSPF, the SPF algorithm is used only in
the same area, and inter-area routes are forwarded by the
backbone area.

Level-1 Router
A Level-1 router manages intra-area routing. It establishes
neighbor relationships with only Level-1 and Level-1-2
routers in the same area. A Level-1 router maintains a
Level-1 link state database (LSDB), which contains routes
in the local area.
A Level-1 router forwards packets destined for other areas
to the nearest Level-1-2 router.
A Level-1 router connects to other areas through a Level-12 router.
Level-2 Router
A Level-2 router manages inter-area routing. It can
establish neighbor relationships with Level-2 routers in the
same area or in other areas, as well as Level-1-2 routers.
A Level-2 router maintains a Level-2 LSDB, which contains
all routes in the IS-IS network.
All Level-2 routers form the backbone network of the
routing domain,They establish Level-2 neighbor
relationships and are responsible for inter-area
communication. Level-2 routers in the routing domain must
be physically contiguous to ensure the continuity of the
backbone network.
Level-1-2 Router
A router that belongs to both a Level-1 area and a Level-2
area is called a Level-1-2 router. It can establish Level-1
neighbor relationships with Level-1 and Level-1-2 routers in
the same area.

It can also establish Level-2 neighbor relationships with


Level-2 and Level-1-2 routers in the same area or the other
areas.
A Level-1 router connects to other areas through a Level-12 router.
A Level-1-2 router maintains a Level-1 LSDB for intra-area
routing and a Level-2 LSDB for inter-area routing.

Network Types Supported by IS-IS


For a non-broadcast multiple access (NBMA) network such
as a frame relay (FR) network, you need to configure subinterfaces and set the sub-interface type to point-to-point
(P2P). IS-IS cannot run on point-to-multipoint (P2MP) links.
DIS
In a broadcast network, IS-IS needs to elect a designated
intermediate system (DIS) from all the routers.
The Level-1 DIS and Level-2 DIS are elected respectively.
The router with the highest DIS priority is elected as the
DIS. If there are multiple routers with the highest DIS
priority, the router with the largest MAC address is elected
as the DIS.
You can set different DIS priorities for electing DISs of
different levels.
A router whose DIS priority is 0 can also participate in a
DIS election, which supports preemption.
All routers (including non-DIS routers) of the same level
and on the same network segment establish adjacencies.
However, the LSDB synchronization is ensured by DISs.
DISs are used to create and update pseudonodes, and
generate link state protocol data units (LSPs) of
pseudonodes. LSPs are used to describe network devices
on the network.

Pseudonode
A pseudonode is used to simulate a virtual node in the
broadcast network. It is not a real router. In IS-IS, a
pseudonode is identified by the system ID of the DIS and
the 1-byte Circuit ID (its value is not 0). The use of
pseudonodes simplifies the network topology.
When the network changes, the number of generated LSPs
is reduced, and the SPF calculation consumes fewer
resources.
Differences Between DIS in IS-IS and designated router (DR)/backup
designated router (BDR) in OSPF
In an IS-IS broadcast network, a router whose priority is 0
also takes part in DIS election. In an OSPF network, a
router whose priority is 0 does not take part in DR election.
In an IS-IS broadcast network, when a new router that
meets the requirements of being a DIS connects to the
network, the router is elected as the new DIS, and the
previous pseudonode is deleted. This causes a new
flooding of LSPs. In an OSPF network, when a new router
connects to the network, it is not immediately elected as the
DR even if it has the highest DR priority.
In an IS-IS broadcast network, all routers (including nonDIS routers) of the same level and on the same network
segment establish adjacencies.

NSAP

An NSAP consists of the initial domain part (IDP) and domain


specific part (DSP). The lengths of the IDP and DSP are variable.
The maximum length of the NSAP is 20 bytes and its minimum
length is 8 bytes.

The IDP is similar to the network ID in an IP address. It is defined


by the ISO and consists of the authority and format identifier (AFI)
and initial domain identifier (IDI). The AFI indicates the address
allocation authority and address format, and the IDI identifies a
domain.

The DSP is similar to the subnet ID and host address in an IP


address. It consists of the High Order DSP (HODSP), system ID,
and NSAP selector (SEL). The HODSP is used to divide areas,
the system ID identifies a host, and the SEL indicates the service
type.

The area address (area ID) consists of the IDP and the HODSP of
the DSP. It identifies a routing domain and areas in a routing
domain. An area address is similar to an area number in OSPF.
Routers in the same Level-1 area must have the same area
address, while routers belong to the Level-2 area can have
different area addresses.

A system ID uniquely identifies a host or router in an area. On a


device, the fixed length of the system ID is 48 bits (6 bytes).
Generally, the device's router ID is converted into a system ID.

An SEL provides similar functions as the protocol identifier of IP.


A transport protocol matches an SEL. The SEL is always 00 in IP.

NET
An NET indicates network layer information about a device. An
NET can be regarded as a special NSAP (SEL is 00). The NET
length is the same as the NSAP length. Its maximum length is 20
bytes and minimum length is 8 bytes. When configuring IS-IS on a
router, you only need to consider an NET but not an NSAP.

A maximum of three NETs can be configured during IS-IS


configuration. When configuring multiple NETs, ensure that their
system IDs are the same.

Hello PDU (IIH)

Level-1 LAN IIHs apply to Level-1 routers on broadcast networks.

Level-2 LAN IIHs apply to Level-2 routers on broadcast networks.

P2P IIHs apply to non-broadcast networks.

Compared to a LAN IIH, a P2P IIH does not have the Priority and
LAN ID fields, but has a Local Circuit ID field. The Priority field
indicates the DIS priority in a broadcast network, the LAN ID field
indicates the system ID of the DIS and pseudonode, and the
Local Circuit ID field indicates the local link ID.

IIHs are used for two neighbors to negotiate MTU by padding the
packets to the maximum size.
LSP LSPs are similar to link-state advertisements (LSAs) in OSPF.

Level-1 routers transmit Level-1 LSPs.

Level-2 routers transmit Level-2 LSPs.

Level-1-2 routers transmit both Level-1 and Level-2 LSPs.

The ATT, OL, and IS-Type fields are major fields in an LSP. The
ATT field identifies that the LSP is sent by a Level-1 or Level-2
router. The OL field identifies the overload state. The IS-Type field
indicates whether the router that generates the LSP is a Level-1
router or Level-2 router (the value 01 indicates Level-1 and the
value 11 indicates Level-2).

The LSP update interval is 15 minutes and aging time is 20


minutes. However, an expired LSP will be kept in the database for
an additional 60 seconds (known as ZeroAgeLifetime) before it is
cleared. The LSP retransmission time is 5 seconds.

Sequence number PDU (SNP)

An SNP contains summary information of the LSDB and is used


to maintain LSDB integrity and synchronization.

Complete SNPs (CSNPs) carry summaries of all LSPs in LSDBs,


ensuring LSDB synchronization between neighboring routers. In a
broadcast network, the DIS periodically sends CSNPs. The
default interval for sending CSNPs is 10 seconds. On a P2P link,
CSNPs are sent only when the neighbor relationship is
established for the first time.

Partial SNPs (PSNPs) carry summaries of LSPs in some LSDBs,


and are used to request and acknowledge LSPs.
Initial Packet Structure of an IS-IS PDU

Intra domain routing protocol discriminator

This field has a fixed value of 0x83 in all IS-IS PDUs.

PDU header length indicator


It identifies the length of the fixed header field.

Version/protocol ID extension
It has a fixed value of 1.

System ID length
It indicates the system ID length and has a fixed
value of 6 bytes.

PDU type
It identifies the PDU type.

Version
It has a fixed value of 1.

Reserve
It is set to all zeros.

Max areas
It indicates the maximum number of areas supported
by the intermediate system (IS). If the value is 3, the
IS supports a maximum of three areas.
IIHs on a P2P link
Circuit type
It indicates the level of the router that sends the PDU.
If this field is set to 0, the PDU will be ignored.
System ID
It indicates the system ID of the originating router
that sends the IIH.
Holding time
It indicates the interval for the peer router to wait for
the originating router to send the next IIH.
PDU length
It indicates the PDU length.
Local circuit ID
It is allocated to the local circuit by the originating
router when the router sends IIHs. This ID is unique
on the router interface. On the other end of the P2P
link, thecircuit ID contained in IIHs may be the same
as or different from the local circuit ID.

Area address TLV


It indicates the area address of the originating router.
IP interface address TLV
It indicates the interface address or IP address of the
router that sends the PDU.
Protocol supported TLV
It indicates protocol types supported by the
originating router, such as IP, CLNP, and IPv6.
Restart option TLV
It is used for graceful restart.
Point-to-point adjacency state TLV
It indicates that three-way handshake is supported.
Multi topology TLV
It indicates that multi-topology is supported.
Padding TLV
It indicates that IIH padding is supported.

LSP

PDU length
It indicates the PDU length.
Remaining lifetime
It indicates the time before an LSP expires
LSP ID
It can be the system ID, pseudonode ID, or LSP
number.
The value 0000.0000.0001.00-00 indicates a
common LSP.
The value 0000.0000.0001.01-00 indicates a
pseudonode LSP.
The value 0000.0000.0001.00-01 indicates a
fragment of a common LSP.
Sequence number
It indicates the sequence number of the LSP. The
value starts from 0 and increases by 1. The
maximum value is 2^32-1.
Checksum
It indicates the checksum. The checksum start after
from the LSP Remaining Time till the end.
P bit
It is used to repair segmented areas and is similar to
the OSPF virtual link. Most vendors do not support
this feature.
ATT bit
It indicates that the originating router is connected to
one or multiple areas.
OL bit
It identifies the overload state.
IS type
It indicates the router type.

Protocol supported TLV


It indicates protocol types supported by the
originating router, such as IP, CLNP, and IPv6.
Area address TLV
It indicates the area address of the originating router.
IS reachability TLV
It is used to list neighbors of the originating router.
IP interface address TLV
It indicates the interface address or IP address of the
router that sends the PDU.
IP internal reachability TLV
It indicates that the IP address is internally reachable.
It is used to advertise the IP address and related
mask information of the area that directly connects to
the router that sends the LSP. A pseudonode LSP
does not contain this TLV.
CSNP and PSNP
PDU length
It indicates the PDU length.
Source-ID
It indicates the system ID of the originating router.
Start LSP-ID
It starts from 0000.0000.0000.00-00.
It ends at ffff.ffff.ffff.ff-ff.
LSP entries
LSP summary information

Routers of different levels cannot establish neighbor relationships.


Level-2 routers cannot establish neighbor relationships with Level-1
routers. However, Level-1-2 routers can establish Level-1 neighbor
relationships with Level-1 routers in the same area, and establish
Level-2 neighbor relationships with Level-2 routers in the same area or
in different areas.
Level-1 routers can only establish Level-1 neighbor relationships with
Level-1 or Level-1-2 routers in the same area.
IP addresses of IS-IS interfaces on both ends of a link must be on the
same network segment.

According to IS-IS principles, the establishment of IS-IS neighbor


relationships is irrelevant to IP addresses. Therefore, routers that
establish neighbor relationships may be on different network
segments. To solve this problem, Huawei devices check the
network segment of routers to ensure that IS-IS neighbor
relationships are correctly established.

You can configure interfaces not to check IP addresses on a P2P


network if the network does not need to check the IP addresses.
In a broadcast network, you need to simulate Ethernet interfaces
as P2P interfaces before configuring the interfaces not to check
IP addresses.

Two routers running IS-IS need to establish a neighbor relationship


before exchanging protocol packets to implement routing. On different
networks, the modes for establishing IS-IS neighbor relationships are
different.

In a broadcast network, routers exchange LAN IIHs to establish


neighbor relationships. LAN IIHs are classified into Level-1 LAN IIHs
(with the multicast MAC address 01-80-C2-00-00-14) and Level-2 LAN
IIHs (with the multicast MAC address 01-80-C2-00-00-15). Level-1
routers exchange Level-1 LAN IIHs to establish neighbor relationships.
Level-2 routers exchange Level-2 LAN IIHs to establish neighbor
relationships. Level-1-2 routers exchange Level-1 LAN IIHs and Level-2
LAN IIHs to establish neighbor relationships.
In this example, two Level-2 routers establish a neighbor relationship
on a broadcast link.

R1 multicasts a Level-2 LAN IIH (with the multicast MAC address


01-80-C2-00-00-15) with no neighbor ID specified.

R2 receives the packet and sets the status of the neighbor


relationship with R1 to Initial. R2 then responds to R1 with a
Level-2 LAN IIH, indicating that R1 is a neighbor of R2.

R1 receives the packet and sets the status of the neighbor


relationship with R2 to Up. R1 then responds to R2 with a Level-2
LAN IIH, indicating that R2 is a neighbor of R1.

R2 receives the packet and sets the status of the neighbor


relationship with R1 to Up. R1 and R2 successfully establish a
neighbor relationship.

The network is a broadcast network, so a DIS needs to be elected.


After the neighbor relationship is established, routers wait for two
intervals before sending Hello PDUs to elect the DIS. Hello PDUs
exchanged by the routers contain the Priority field. The router with the
highest priority is elected as the DIS. If the routers have the same
priority, the router with the largest interface MAC address is elected as
the DIS. In an IS-IS network, the DIS sends Hello PDUs at an interval
of 10/3 seconds, and non-DIS routers send Hello PDUs at an interval of
10 seconds.
Differences between IS-IS Adjacencies and OSPF Adjacencies

In IS-IS, two neighbor routers establish an adjacency if they


exchange Hello PDUs. In OSPF, two routers establish a neighbor
relationship if they are in 2-Way state, and establish an adjacency
if they are in Full state.

In IS-IS, a router whose priority is 0 can participate in a DIS


election. In OSPF, a router whose priority is 0 does not take part
in DR election.

In IS-IS, the DIS election is based on preemption. In OSPF, a


router cannot preempt to be the DR or BDR if the DR or BDR has
been elected.

Unlike the establishment of a neighbor relationship on a broadcast


network, the establishment of a neighbor relationship on a P2P network
is classified into two modes: two-way mode and three-way mode.

Two-Way Mode

Upon receiving a P2P IIH from a peer router, a router


considers the peer router Up and establishes a neighbor
relationship with the peer router.

Unidirectional communication may occur.


Three-Way Mode

A neighbor relationship is established after P2P IIHs are


sent for three times. The establishment of a neighbor
relationship on a P2P network is similar to that on a
broadcast network.

The process of synchronizing LSDBs between a newly added router


and DIS on a broadcast link is as follows:

Assume that the newly added router R3 has established neighbor


relationships with R2 (DIS) and R1.

R3 sends an LSP to a multicast address (01-80-C2-00-00-14 in a


Level-1 area and 01-80-C2-00-00-15 in a Level-2 area). All
neighbors on the network can receive the LSP.

The DIS on the network segment adds the received LSP to its
LSDB. After the CSNP timer expires, the DIS sends CSNPs at an
interval of 10 seconds to synchronize the LSDBs on the network.

R3 receives the CSNPs from the DIS, checks its LSDB, and
sends a PSNP to the DIS to request the LSPs it does not have.

The DIS receives the PSNP and sends the required LSPs to R3
for LSDB synchronization.
The process of updating the LSDB of the DIS is as follows:

The DIS receives an LSP and searches the matching record in


the LSDB. If no matching record exists, the DIS adds the LSP to
the LSDB and multicasts the new LSDB.

If the sequence number of the received LSP is larger than that of


the corresponding LSP in the LSDB, the DIS replaces the local
LSP with the received LSP and multicasts the new LSDB. If the
sequence number of the received LSP is smaller than that of the
LSP in the LSDB, the DIS sends the local LSP to the inbound
interface.

If the sequence number of the received LSP is the same as that of


the
corresponding LSP in the LSDB, the DIS compares the remaining
lifetime of the two LSPs. If the remaining lifetime of the received
LSP is smaller than that of the LSP in the LSDB, the DIS replaces
the local LSP with the received LSP and broadcasts the new
LSDB. If the remaining lifetime of the received LSP is larger than
that of the LSP in the LSDB, the DIS sends the local LSP to the
inbound interface.
If the sequence number and the remaining lifetime of the received
LSP and those of the corresponding LSP in the LSDB are the
same, the DIS compares the checksum of the two LSPs. If the
checksum of the received LSP is larger than that of the LSP in the
LSDB, the DIS replaces the local LSP with the received LSP and
broadcasts the new LSDB. If the checksum of the received LSP is
smaller than that of the LSP in the LSDB, the DRB sends the local
LSP to the inbound interface.
If the sequence number, remaining lifetime, and checksum of the
received LSP and those of the corresponding LSP in the LSDB
are the same, the LSP is not forwarded.

The process of synchronizing LSDBs on a P2P network is as follows:

After establishing a neighbor relationship, R1 and R2 send a


CSNP to each other. If the LSDB of the neighbor and the received
CSNP are not synchronized, the neighbor sends a PSNP to
request the required LSP.

Assume that R2 requests the required LSP from R1. R1 sends


the required LSP to R2, starts the LSP retransmission timer, and
waits for a PSNP from R2 as an acknowledgement for the
received LSP.

If R1 does not receive a PSNP from R2 after the LSP


retransmission timer expires, R1 resends the LSP until it receives
a PSNP from R2.
The process of updating LSDBs on a P2P link is as follows:

If the sequence number of the received LSP is smaller than that of


the corresponding LSP in the LSDB, the router directly sends the
local LSP to the neighbor and waits for a PSNP from the neighbor.
If the sequence number of the received LSP is larger than that of
the corresponding LSP in the LSDB, the router adds the received
LSP to its LSDB, sends a PSNP to acknowledge the received
LSP, and then sends the received LSP to all its neighbors except
the neighbor that sends the LSP.

If the sequence number of the received LSP is the same as that of


the corresponding LSP in the LSDB, the router compares the
remaining lifetime of the two LSPs.

If the remaining lifetime of the received LSP is smaller than that of


the LSP in the LSDB, the router replaces the local LSP with the
received LSP, sends a PSNP to acknowledge the received LSP,
and sends the received LSP to all neighbors except the neighbor
that sends the LSP. If the remaining lifetime of the received LSP
is larger than that of the LSP in the LSDB, the router sends the
local LSP to the neighbor and waits for a PSNP.
If the sequence number and remaining lifetime of the received
LSP are the same as those of the corresponding LSP in the LSDB,
the router compares the checksums of the two LSPs. If the
checksum of the received LSP is larger than that of the LSP in the
LSDB, the router replaces the local LSP with the received LSP,
sends a PSNP to acknowledge the received LSP, and sends the
received LSP to all neighbors except the neighbor that sends the
LSP. If the checksum of the received LSP is smaller than that of
the LSP in the LSDB, the router sends the local LSP to the
neighbor and waits for a PSNP.
If the sequence number, remaining lifetime, and checksum of the
received LSP and those of the corresponding LSP in the LSDB
are the same, the LSP is not forwarded.

On a P2P network, a PSNP has the following functions:

It is used to acknowledge a received LSP.

It is used to request a required LSP.

Assume that R1 sends packets to R6. The default situation is as


follows:

As a Level-1 router, R1 does not know routes outside its area, so


it sends packets to other areas through the default route
generated by the nearest Level-1-2 router (R3). Therefore, R1
selects the route R1->R3->R5->R6, which is not the optimal

route, to forward the packets.


To solve this question, IS-IS provide the Route Leaking. You can
configure access control lists (ACLs) and routing policies and mark
routes with tags on Level-1-2 routers to select eligible routes. Then a
Level-1-2 router can advertise routing information of other Level-1
areas and the backbone area to its Level-1 area.
If route leaking is enabled on Level-1-2 routers (R3 and R4), Level-1
routers in area 47.0001 can know of routes outside area 47.0001 and
routes passing through the two Level-1-2 routers. After route calculation,
the forwarding path becomes R1->R2->R4->R5->R6, which is the
optimal route from R1 to R6.

Principles

LSPs with the overload bit are still flooded on the network,
but the LSPs are not used when routes that pass through a
router configured with the overload bit are calculated. That is,
after the overload bit is set on a router, other routers ignore this
router when performing SPF calculation and calculate only the
direct routes of the router.

Topology

R2 forwards the packets from R1 to R3. If the overload bit on R2


is set to 1, R1 considers the LSDB of R2 incomplete and sends
packets to R3 through R4 and R5. This process does not affect
packets sent to the directly connected address of R2.

A device enters the overload state in the following situations:

A device automatically enters the overload state due to


exceptions.

You can manually configure a device to enter the overload


state.
Results of entering the overload state

If the system enters the overload state due to exceptions, the


system deletes all the imported or leaked routes.
If the system is configured to enter the overload state, the system
determines whether to delete all the imported or leaked routes
based on the configuration.

Fast Convergence

Incremental SPF (I-SPF): recalculates only the routes of the


changed nodes rather than all the nodes when the network
topology changes, with exception to where calculation is
performed for the first time, at which time all nodes are involved,
thereby speeding up route calculation. I-SPF improves the SPF
algorithm. The shortest path tree (SPT) generated is the same as
that generated by the SPF algorithm. This decreases CPU usage
and speeds up network convergence.

Partial route calculation (PRC): calculates only the changed


routes when the network topology changes. Similar to I-SPF, PRC
calculates only the changed routes, but it does not calculate the

shortest path. It updates routes based on the SPT


calculated by I-SPF. In route calculation, a leaf represents a
route, and a node represents a router. If the SPT changes
after I-SPF calculation, PRC processes all the leaves only on the
changed node. If the SPT remains unchanged, PRC processes
only the changed leaves. For example, if IS-IS is enabled on an
interface of a node, the SPT calculated by I-SPF remains
unchanged. PRC updates only the routes of this interface,
consuming less CPU resources.

Intelligent Timer
LSP generation intelligent timer: There is a minimum
interval restriction on LSP generation to prevent frequent
flapping of LSPs from affecting the network. The same LSP
cannot be generated repeatedly within the minimum
interval, which is 5 seconds by default. This restriction
significantly affects route convergence speed.
In IS-IS, if local routing information changes,
a router generates a new LSP to advertise this change.
When local routing information changes frequently, the
newly generated LSPs consume a lot of system resources.
If the delay in generating an LSP is too long, the router
cannot advertise changed routing information to neighbors
in time, reducing the network convergence speed. The
delay in generating an LSP for the first time is determined
by init-interval, and the delay in generating an LSP for the
second time is determined by incr-interval. From the third
time on, the delay in generating an LSP increases twice
every time until the delay reaches the value specified by
max-interval. After the delay remains at the value specified
by max-interval for three times or the IS-IS process is
restarted, the delay decreases to the value specified by initinterval. When only max-interval is specified, the intelligent
timer functions as an ordinary one-time triggering timer.
SPF calculation intelligent timer: In IS-IS, routes are
calculated when the LSDB changes. However, frequent
route calculations consume a lot of system resources and
decrease the system performance. Delaying SPF
calculation can improve route calculation efficiency. If the
delay in route calculation is too long, the route convergence
speed is reduced. The delay in SPF calculation for the first
time is determined by init-interval and the delay in SPF
calculation for the second time is determined by incrinterval. From the third time on, the delay in SPF
calculation increases twice every time until the delay
reaches the value specified by max-interval. After the delay
remains at the value specified by max-interval for three
times or the IS-IS process is restarted, the delay decreases
to the value specified by init-interval. If incr-interval is not
specified, the delay in SPF calculation for the first time is
determined by init-interval. From the second time on, the
delay in SPF calculation is determined by max-interval.
After the delay remains at the value specified by maxinterval for three times or the IS-IS process is restarted, the
delay decreases to the value specified by init-interval.
When only max-interval is specified, the intelligent timer
functions as an ordinary one-time triggering timer.

LSP fast flooding: Because the number of LSPs is huge, IS-IS


periodically floods LSPs in batches to reduce the impact of LSP
flooding on network devices. By default, the minimum interval for
sending LSPs on an interface is 50 milliseconds and the
maximum number of LSPs sent at a time is 10. After the flashflood function is enabled, when LSPs change and cause SPF
recalculation, IS-IS immediately floods LSPs that cause SPF
recalculation instead of sending the LSPs periodically. When the
network topology changes, LSDBs of all devices on the network
are inconsistent. This function effectively reduces the time during
which LSDBs are inconsistent and improves the network fast
convergence performance. When a network fault occurs, only a
small number of LSPs change although a large number of LSPs
exist. Therefore, IS-IS only needs to flood the changed LSPs and
consumes a few system resources.
Priority-based Convergence

You can use the IP prefix list to filter routes and configure different
convergence priorities for different routes so that important routes
are converged first, improving the network reliability.

The convergence priorities of IS-IS routes are classified into


critical, high, medium, and low in decreasing order.

In area authentication and routing domain authentication, you can


configure a router to authenticate LSPs and SNPs separately in the
following ways:

The router sends LSPs and SNPs carrying the authentication TLV
and verifies the authentication information of the received LSPs
and SNPs.

The router sends LSPs carrying the authentication TLV and


verifies the authentication information of the received LSPs. The
router sends SNPs carrying the authentication TLV but does not
verify the authentication information of the received SNPs.

The router sends LSPs carrying the authentication TLV and


verifies the authentication information of the received LSPs.
The router sends SNPs without the authentication TLV and
does not verify the authentication information of the received
SNPs.

The router sends LSPs and SNPs carrying the authentication TLV
but does not verify the authentication information of the received
LSPs and SNPs.

Concepts

Originating system: is a router that runs the IS-IS protocol. After


LSP fragment extension is enabled, you can configure virtual
systems for the router. The originating system refers to the IS-IS
process.

System ID: is the system ID of the originating system.

Additional System ID: is configured for a virtual system after IS-IS


LSP fragment extension is enabled. A maximum of 256 extended
LSP fragments can be generated for each additional system ID.
Like a normal system ID, an additional system ID must be unique
in a routing domain.

Virtual system: is a system identified by an additional system ID. It


is used to generate extended LSP fragments.
Principles

IS-IS floods LSPs to advertise link state information. Because one


LSP carries limited information, IS-IS fragments LSPs. Each LSP
fragment is uniquely identified by and consists of the system ID,
pseudonode ID (0 for a common LSP and a non-zero value for a
pseudonode LSP), and LSP number (LSP fragment No.) of the
node or pseudonode that generates the LSP. The length of the
LSP number is 1 byte. Therefore, an IS-IS router can generate a
maximum of 256 LSP fragments, restricting link information that
can be advertised by the router.

The LSP fragment extension feature enables an IS-IS router to


generate more LSP fragments. You can configure up to 50 virtual
systems for the router. Each virtual system can generate a
maximum of 256 LSP fragments. An IS-IS router can generate a
maximum of 13,056 LSP fragments.
An IS-IS router can run the LSP fragment extension feature in two
modes.

Mode-1

It is used when some routers on the network do not support


LSP fragment extension.

Virtual systems participate in SPF calculation. The


originating system advertises LSPs containing information
about links to each virtual system. Similarly, each virtual
system advertises LSPs containing information about links
to the originating system. Virtual systems look like the
physical routers that connect to the originating system.

The LSP sent by a virtual system contains the same area


address and overload bit as those in a common LSP. If the
LSPs sent by a virtual system contain TLVs specified in
other features, these TLVs must be the same as those in
common LSPs.

The virtual system carries neighbor information indicating


that the neighbor is the originating system, with the metric
equal to the maximum value (64 for narrow metric) minus 1.
The originating system carries neighbor information
indicating that the neighbor is the virtual system, with the
metric 0. This ensures that the virtual system is the
downstream node of the originating system when other
routers calculate routes.

As shown in the topology, R2 does not support LSP


fragment extension, and R1 is configured to support LSP
fragment extension in mode-1. R1-1 and R1-2 are virtual
systems of R1 and send LSPs carrying some routing
information of R1. After receiving LSPs from R1, R1-1, and
R1-2, R2 considers that there are three individual routers at
the remote end and calculates routes. Because the cost of
the route from R1 to R1-1 and the cost of the route from R1
to R1-2 are both 0, the cost of the route from R2 to R1 is
the same as the cost of the route from R2 to R1-1.

The LSPs that are generated by virtual systems contain


only the originating system as the neighbor (the neighbor
type is P2P). In addition, virtual systems are considered
only as leaves.

Mode-2

It is used when all the routers on the network support LSP


fragment extension. In this mode, virtual systems do not
participate in SPF calculation.

All the routers on the network know that the LSPs


generated by virtual systems actually belong to the
originating system.

R2 supports LSP fragment extension, and R1 is configured


to support LSP fragment extension in mode-2. R1-1 and
R1-2 are virtual systems of R1 and send LSPs carrying
some routing information of R1.
When receiving LSPs from R1-1 and R1-2, R2 obtains the IS
Alias ID TLV and knows that the originating system of R1-1
and R1-2 is R1. R2 then considers that information
advertised by R1-1 and R1-2 belongs to R1.
Precautions
After LSP fragment extension is configured, the system
prompts you to restart the IS-IS process if information is
lost because LSPs overflow. After being restarted, the
originating system loads as much routing information as
possible to LSPs, and adds the overloaded information to
the LSPs of the virtual system for transmission.
If there are devices of other vendors on the network, LSP
fragment extension must be set to mode-1, otherwise,
devices of other vendors cannot identify the LSPs.
It is recommended that you configure LSP fragment
extension and virtual systems before establishing IS-IS
neighbor relationships or importing routes. If you establish
IS-IS neighbor relationships or import routes, IS-IS will
carry a lot of information that cannot be loaded through 256
fragments. You must configure LSP fragment extension
and virtual systems. The configuration takes effect only
after you restart the IS-IS router. Therefore, exercise
caution when you establish IS-IS neighbor relationships or
import routes.

IS-IS Administrative Tag

Administrative tags control the advertisement of IP prefixes in an


IS-IS routing domain to simplify route management. You can use
administrative tags to control the import of routes of different
levels and different areas and control IS-IS multi-instances (tags)
running on the same router.
Topology

Assume that R1 only needs to receive only Level-1 routing


information from R2, R3, and R4. To meet this requirement,
configure the same administrative tag for IS-IS interfaces on R2,
R3, and R4. Then configure the Level-1-2 router in area 47.0003
to leak only the routes matching the configured administrative tag
from Level-2 to Level-1 areas. This configuration allows R1 to
receive only Level-1 routing information from R2, R3, and R4.
Precautions

To use administrative tags, you must enable the IS-IS wide metric
attribute.

Case Description

In this case, the addresses for interconnecting devices are as


follows:

If RX interconnects with RY, their interconnection


addresses are XY.1.1.X and XY.1.1.Y respectively, network
mask is 24.
Remarks

R4 and R5 are Level-1-2 routers. They take part in calculate the


routes of Level-1 and Level-2 at the same time, and maintain the
Level-1 and Level-2 LSDB.

Command Usage

The is-level command sets the level of an IS-IS router. By


default, the level of an IS-IS router is Level-1-2.

The isis circuit-level command sets the link type of an


interface.
View

is-level: IS-IS view

isis circuit-level: interface view


Parameters

is-level { level-1 | level-1-2 | level-2 }


level-1: sets a router as a Level-1 router, which
calculates only intra-area routes and maintains a Level-1
LSDB.
level-1-2: sets a router as a Level-1-2 router, which
calculates Level-1 and Level-2 routes and maintains a
Level-1 LSDB and a Level-2 LSDB.
level-2: sets a router as a Level-2 router, which
exchanges only Level-2 LSPs, calculates only Level-2
routes, and maintains a Level-2 LSDB.
isis circuit-level [ level-1 | level-1-2 | level-2 ]
level-1: specifies the Level-1 link type. That is, only
Level-1 neighbor relationship can be established on the
interface.
level-1-2: specifies the Level-1-2 link type. That is, both
Level-1 and Level-2 neighbor relationships can be
established on the interface.

level-2: specifies the Level-2 link type. That is, only


Level-2 neighbor relationship can be established on the
interface.
Precautions

If a router is a Level-1-2 router and needs to establish a


neighbor relationship at a specified level (Level-1 or Level2) with a peer router, you can run the isis circuit-level
command to allow the local interface to send and receive
only Hello packets of the specified level on the P2P link.
This configuration prevents the router from processing too
many Hello packets and saves the bandwidth.

The configuration of the isis circuit-level command takes


effect on the interface only when the IS-IS system type is
Level-1-2, otherwise, the level configured using the islevel command is used as the link type.

In a P2P network, the Circuit ID uniquely identifies a local


interface. In a broadcast network, the Circuit ID is the
system ID and pseudonode ID.

Case Description

The topology in this case is the same as that in the previous case.
It is required that no DIS can be elected between R4 and R6 or
between R5 and R6. That is, the links between R4 and R6 and
between R5 and R6 cannot be broadcast links.

A priority that is as small as possible but can still enable a router


to participate in the DIS election is 0.

Command Usage

The isis dis-priority command sets the priority of the interface


that is a candidate for the DIS at a specified level.

The isis circuit-type command simulates the network type of an


interface to P2P.
View

isis dis-priority: interface view

isis circuit-type: interface view


Parameters

isis dis-priority priority [ level-1 | level-2 ]


Specifies the priority for electing DIS. The value ranges from 0
to 127. The default value is 64. The greater the value of priority
is, the higher the priority is.
level-1 Indicates the priority for electing Level-1 DIS.
level-2 Indicates the priority for electing Level-2 DIS.

isis circuit-type p2p


Sets the interface network type as P2P.
Precautions

The isis dis-priority command takes effect only on a broadcast


link.

The isis circuit-type command takes effect only on a broadcast


interface. The network types of IS-IS interfaces on both ends of a
link must be the same, otherwise, the two interfaces cannot
establish a neighbor relationship.
Configuration Verification

Run the display isis interface process-id command, and view


the DIS field in the command output.

Case Description

The topology in this case is the same as that in the previous case.
Company A requires route control. When configuring tags, you
should also enable IS-IS wide metric on all devices in the network
so that the tags can be transmitted in the entire network. In
addition, Level-2 routes cannot be directly leaked to Level-1 areas
and need to be configured manually.

Command Usage

The import-route command configures IS-IS to import routes


from other routing protocols.

The import-route isis level-2 into level-1 command controls


route leaking from Level-2 areas to Level-1 areas. The command
needs to be configured on Level-1-2 routers that are connected to
external areas.

The cost-style command sets the cost style of routes sent and
received by an IS-IS router.
View

import-route: IS-IS view

import-route isis level-2 into level-1: IS-IS view

cost-style: IS-IS view


Parameters

import-route isis level-2 into level-1 [ filter-policy { aclnumber | acl-name acl-name | ip-prefix ip-prefix-name | routepolicyroute-policy-name } | tag tag ]
filter-policy: indicates the route filtering policy.
acl-name: specifies the number of a basic ACL.
acl-name acl-name: specifies the name of a named ACL.
ip-prefix ip-prefix-name: specifies the name of an IP prefix.
Only the routes that match the IP prefix can be imported.
route-policy route-policy-name: specifies the name of a
routing policy.
tag tag: assigns administrative tags to the imported
routes.

cost-style { narrow | wide | wide-compatible }

narrow: indicates that the device can receive and send


routes with cost style narrow.
wide: indicates that the device can receive and send
routes with cost style wide.
wide-compatible: indicates that the device can receive
routes with cost style narrow or wide but sends only
routes with cost style wide.
Precautions

To transmit tags in the entire network, run the cost-style wide


command on all devices in the network.
Configuration Verification

Run the display isis router command to view tag information.

Case Description

The topology in this case is the same as that in the previous case.
Company A reconstructs its network. IS-IS uses ACLs, IP prefix
lists, and tags to control routes.

Command Usage

The filter-policy import command allows IS-IS to filter the


received routes to be added to the IP routing table.
View

filter-policy import: IS-IS view


Parameters

filter-policy { acl-number | acl-name acl-name | ip-prefix ipprefix-name | route-policy route-policy-name } import


acl-number: specifies the number of a basic ACL.
acl-name acl-name: specifies the name of a named ACL.
ip-prefix ip-prefix-name: specifies the name of an IP
prefix list.
route-policy route-policy-name: specifies the name of a
routing policy that filters routes based on tags and
other protocol parameters.
Precautions
IS-IS can control routes and determine whether a route is
added to the routing table. However, LSP transmission is
not affected.
The filter-policy export command takes effect only when it
is used together with the filter-policy import command.
+IP-Extended* indicates that wide metric is supported. The
symbol * indicates that the route is learned through route
leaking.

Case Description

IS-IS authentication classifies into area authentication, routing


domain authentication, and interface authentication.

Command Usage

The area-authentication-mode command configures an IS-IS


area to authenticate received Level-1 packets (LSPs and SNPs)
using the specified authentication mode and password, or adds
authentication information to Level-1 packets to be sent.

The isis authentication-mode command configures an IS-IS


interface to authenticate Hello packets using the specified mode
and password.
View

area-authentication-mode: IS-IS view

isis authentication-mode: interface view


Parameters

isis authentication-mode { simple password | md5 passwordkey } [ level-1 | level-2 ] [ ip | osi ] [ send-only ]
simple password: indicates that the password is
transmitted in plain text.
md5 password-key: indicates that the password to be
transmitted is encrypted using MD5.
keychain keychain-name: specifies a keychain that
changes with time.
level-1: sets Level-1 authentication.
level-2: sets Level-2 authentication.
ip: indicates the IP authentication password. This
parameter cannot be configured in the keychain authentication
mode.
osi: indicates the OSI authentication password. This
parameter cannot be configured in the keychain authentication
mode.

send-only: indicates that the router encapsulates sent Hello


packets with authentication information but does not
authenticate received Hello packets.

area-authentication-mode { simple password | md5 passwordkey } [ ip | osi ] [ snp-packet { authentication-avoid | send-only }


| all-send-only ]
simple password: indicates that the password is
transmitted in plain text.
md5 password-key: indicates that the password to be
transmitted is encrypted using MD5.
keychain keychain-name: specifies a keychain that
changes with time.
ip: indicates the IP authentication password. This
parameter cannot be configured in the keychain authentication
mode.
osi: indicates the OSI authentication password. This
parameter cannot be configured in the keychain authentication
mode.
send-only: indicates that the router encapsulates sent
Hello packets with authentication information but does not
authenticate received Hello packets.
all-send-only: indicates that the router encapsulates
generated LSPs and SNPs with authentication information and
does not authenticate received LSPs and SNPs.
authentication-avoid: indicates that the router does not
encapsulate generated SNPs with authentication information
or authenticates received SNPs. The router encapsulates
generated LSPs with authentication information and
authenticates received LSPs.
snp-packet: authenticates SNPs.
Precautions

The area-authentication-mode command takes effect only on


Level-1 and Level-1-2 routers.

Case Description

In this case, the addresses for interconnecting devices are as


follows:

If RX interconnects with RY, their interconnection


addresses are XY.1.1.X and XY.1.1.Y respectively, network
mask is 24.

R2 connects to R3 and R1 through serial interfaces. R1 and R3


connect through Ethernet interfaces. R1 connects to network
10.0.0.0/24 through G0/0/1.

Results

You can run the display isis peer command to check whether
neighbor relationships are established successfully.

Results

You can run the display isis interface command to view the
interface relationship.

Results

You can run the display ip routing-table command to view the


routing table.

Case Description

In this case, the network runs IS-IS.


Requirement analysis

The log prompt function of IS-IS is disabled by default.

Results

The nexthop command sets the preferences of equal-cost routes.

After IS-IS calculates equal-cost routes using the SPF algorithm,


the next hop is chosen from these equal-cost routes based on the
value of weight. The smaller the value is, the higher the
preference is.
Parameters

nexthop ip-address weight value


ip-address: indicates the next hop address.
weight value: indicates the next hop weight. The value is
an integer that ranges from 1 to 254. The default value
is 255.

Results

The summary ip-address mask avoid-feedback |


generate_null0_route command avoids learning the aggregation
route again. It can also generate a route to the Null0 interface to
prevent loops.

You need to manually open logs of a neighbor.

OSPF topology:
OSPF divides an Autonomous System (AS) into one or
multiple logical areas. All areas are connected to Area 0.Area
0 is backbone Area.

Router type:
Internal router: All interfaces on an internal router belong to the
same OSPF area.
Area Border Router (ABR): An ABR belongs to two or more
areas, one of which must be the backbone area. An ABR is
used to connect the backbone area and non-backbone areas.
It can be physically or logically connected to the backbone
area.
Backbone router: At least one interface on a backbone router
belongs to the backbone area. Internal routers in Area 0 and
all ABRs are backbone routers.
AS Boundary Router (ASBR): An ASBR exchanges routing
information with other ASs. An ASBR does not necessarily
reside on the border of an AS. It can be an internal router or an
ABR. An OSPF device that has imported external routing
information will become an ASBR.
Differences between OSPF and IS-IS in the topology:
In OSPF, a link can belongs to only one area.In IS-IS, a link
can belong to different areas.

In IS-IS, no area is physically defined as the backbone or


non-backbone area. In OSPF, Area 0 is defined as the
backbone area.
In IS-IS, Level-1 and Level-2 routers use the shortest path
first (SPF) algorithm to generate shortest path trees (SPTs)
respectively. In OSPF, the SPF algorithm is used only in the
same area, and inter-area routes are forwarded by the
backbone area.

OSPF supports the following network types:


P2P: A network where the link layer protocol is PPP or HDLC
is a P2P network by default. On a P2P network, protocol
packets such as Hello packets, DD packets, LSR packets,
LSU packets, and LSAck packets are sent in multicast mode
using the multicast address 224.0.0.5.
P2MP: No network is a P2MP network by default, no matter
what type of link layer protocol is used on the network. A
network can be changed to a P2MP network. The common
practice is to change a non-fully meshed NBMA network to a
P2MP network. On a P2MP network, Hello packets are sent in
multicast mode using the multicast address 224.0.0.5, and
other types of protocol packets, such as DD packets, LSR
packets, LSU packets, and LSAck packets are sent in unicast
mode.
NBMA: A network where the link layer protocol is ATM or FR is
an NBMA network by default. On an NBMA network, protocol
packets such as Hello packets, DD packets, LSR packets,
LSU packets, and LSAck packets are sent in unicast mode.
Broadcast: A network with the link layer protocol of Ethernet or
FDDI is a broadcast network by default. On a broadcast
network, Hello packets, LSU packets, and LSAck packets are
usually sent in multicast mode. The multicast addresses
224.0.0.5 is used by an OSPF device. The multicast address
224.0.0.6 is reserved for an OSPF designated router (DR). DD
and LSR packets are transmitted in unicast mode.

DR/BDR functions
Reduces the number of neighbors and further reduces the
number of times that link-state information and routing
information are updated. The DRother sets up full adjacency
only with the DR/BDR. The DR and BDR set up full adjacency
with each other.
The DR generates Network-LSAs to describe information about
the NBMA or broadcast network segment.

DR/BDR election rules


When Hello is used for DR/BDR election, the DR/BDR is
elected based on Router Priority of interfaces.
If Router Priority is set to 0, the router cannot be elected as
the DR or BDR.
A larger value of Router Priority indicates a higher priority. If
the value of Router Priority is the same on two interfaces, the
interface with a larger Router ID is elected.
The DR/BDR cannot preempt resources.
If the DR is faulty, the BDR automatically becomes the new DR,
and a new BDR is elected on the network. If the BDR is faulty,
the DR does not change, and a new BDR is elected.
Differences between IS-IS DIS and OSPF DR/BDR
On an IS-IS broadcast network, routers with priority 0 still
participate in DIS election. On an OSPF network, routers with
priority 0 do not participate in DR election
On an IS-IS broadcast network, when a new router meeting
DIS conditions joins the network, the router is elected as the
new DIS, and the original pseudonode is deleted. This causes
LSP flooding. On an OSPF network, a new router will not
immediately become the DR on the network segment even if
the router has the highest DR priority.
On an IS-IS broadcast network, routers with the same level on
the same network segment form adjacencies with each other,
including all non-DIS routers.

Overview of OSPF packets


OSPF packets are transmitted at the network layer. The
protocol number is 89. There are five types of OSPF packets,
whose packet headers are in the same format.
OSPF packets except the Hello packet carry LSA information.
OSPF packet header information
All OSPF packets have the same OSPF packet header.
Version: specifies the OSPF protocol number. This field must
be set to 2.
Type: specifies the OSPF packet type. There are five types of
OSPF packets.
Packet length: specifies the total length of an OSPF packet,
including the packet header. The unit is byte.
Router ID: specifies the router ID of the router generating the
packet
Area ID: specifies the area to which the packet is to be
advertised.
Checksum: specifies the standard IP checksum of the entire
packet (including the packet header).
AuType: specifies the authentication mode
Authentication: specifies information for authenticating packets,
such as the password.
Hello packet
Network Mask: specifies the network mask of the interface
sending Hello packets.

HelloInterval: specifies the interval for sending Hello packets, in


seconds.
Options: specifies optional functions supported by the OSPF
router sending the Hello packet. Detailed functions are not
mentioned in this course.
Rtr Pri: specifies the router priority on the interface sending
Hello packets. This field is used for electing the DR and BDR.
RouterDeadInterval: specifies the interval for advertising that
the neighbor router does not run OSPF on the network
segment, in seconds. In most cases, the value of this field is
four times HelloInterval.
Designated Router: specifies the IP address of the DR elected
by routers sending Hello packets. The value 0.0.0.0 of this field
indicates that the DR is not elected.
Backup Designated Router: specifies the IP address of the
BDR elected by routers sending Hello packets. The value
0.0.0.0 of this field indicates that the BDR is not elected.
Neighbor: specifies the neighbor router ID, indicating that the
router has received valid Hello packets from neighbors.

DD packet
Interface MTU: specifies the maximum IP data packet size that
an interface on the originating router can send without
fragmentation. The value of this field is 0x0000 on a virtual link.
Options: is the same as that of the Hello packet.
I-bit: is set to 1 for the first DD packet in a series of sent DD
packets. The I-bit fields of subsequent DD packets are 0.
M-bit: is set to 1 when the sent DD packet is not the last one.
The M-bit field of the last DD packet is set to 0.
MS-bit: advertises the router as the master router.
DD Sequence Number: specifies the sequence number of the
DD packet.
LSA header information
LSR packet
Link State Advertisement Type: specifies the LSA type, which
can be router-LSA, network-LSA, or other LSA types.
Link State ID: varies depending on LSA types.
Advertising Router: specifies the router ID of the originating
router that advertises LSAs.
LSU packet
Number of LSAs: specifies the number of LSAs in an LSU
packet.
LSA: specifies detailed LSA information.

LSU packet
Header of LSA: specifies LSA header information.

LSA header information contained in all OSPF packets excluding Hello


packets

LS age: specifies the age of the LSA, in seconds.

Option: specifies optional performance that LSAs supported in


some OSPF areas.

LS type: identifies the format and functions of LSAs. There are


five types of commonly used LSAs.

Link State ID: varies with LSAs.

Advertising Router: specifies router ID in the first LSA.

Sequence Number: increases with the generation of LSA


instances. This field allows other routers to identify latest LSA
instances.

Checksum: indicates the checksum of all information in an LSA.


The checksum needs to be recalculated as the aging time
increases.

Length: specifies the length of an LSA, including the LSA header.


Router-LSA (describing all interfaces or links on the originating router)

Link State ID: specifies the router ID of the originating router.

V: indicates that the originating router is an endpoint on one or


more virtual links with full adjacency when this field is set to 1.

E: is set to 1 when the originating router is an ASBR.

B: is set to 1 when the originating router is an ABR.

Number of links: specifies the number of router links described in


an LSA.

Link Type: indicates the link type. The value of this field can be:
1: P2P link to a device, point-to-point connection to another router

2: link to a transit network, such as broadcast or NBMA network


3: link to a subnet, such as Loopback interface
4: virtual link
Link ID: specifies the link ID. The value of this field can be:
1: neighbor router ID
2: IP address of the interface on a DR
3: IP network or subnet address
4: neighbor router ID
Link Data: indicates more information about a link. This field
specifies the IP address of the interface on the originating router
connected to the network when the value of Link Type is 1 or 2,
and specifies the IP address or subnet mask of the network when
the value of Link Type is 3.
ToS: is not supported.
Metric: specifies the metric of a link or interface.

Network-LSA

Link State ID: specifies the IP address of the interface on a DR.

Network Mask: specifies the IP address or subnet mask used on


the network.

Attached router: lists router IDs of the DR and all routers that have
set up adjacency relationships with the DR on an NBMA network.
Network-summary-LSA and ASBR-summary-LSA

Link State ID: specifies the IP address of the network or subnet in


a Type 3 LSA. In a Type 4 LSA, this field specifies the router ID of
the ASBR.

Network Mask: specifies the IP address or subnet mask of the


network in a Type 3 LSA. In a Type 4 LSA, this field has no
meaning and is set to 0.0.0.0.

Metric: specifies the metric of a route to the destination.


AS-external-LSA

Link State ID: Indicates the advertised network or subnetIP

address.

Network Mask: specifies the destination IP address or subnet


mask.
E: specifies the type of the external route. The value 1 indicates
the E2 metric, and the value 0 indicates the E1 metric.
Metric: specifies the metric of a route and is set by an ASBR.
Forwarding Address: specifies the forwarding address (FA) of a
packet destined for a specific destination address. When this field
is set to 0.0.0.0, the packet is forwarded to the originating router.
External Route Tag: identifies an external route.

NSSA LSA

Forwarding Address: When an internal route is advertised


between an NSSA ASBR and the neighboring AS, this field is set
to the next-hop address of the local network. When the internal
route is not used for advertisement, this field is set to the interface
ip of the stub network,such as loopback,if have multi stub
network,choose the maximum ip address.

Options field:

DN: prevents loops on an MPLS VPN network. When a type 3, 5,


or 7 LSA is sent from a PE to a CE, the DN bit MUST be set.
When the PE receives, from a CE router, a type 3, 5, or 7 LSA
with the DN bit set, the information from that LSA MUST NOT be
used during the OSPF route calculation.

O: indicates that the originating router supports Opaque LSAs


(Type 9, 10, and 11 LSAs).

DC-bit: indicates that the originating router supports OSPF


capabilities of on-demand links.

EA: indicates that the originating router can receive and forward
External-Attributes-LSA(type8 LSA).

N-bit: exists only in Hello packets. The value 1 indicates that the
router supports Type 7 LSAs. The value 0 indicates the router
does not receive or send NSSA LSAs.

P-bit: exists only in NSSA LSAs. This field instructs an NSSA


ABR to convert the Type 7 LSA into a Type 5 LSA.

MC-bit: indicates that the originating router supports multicast,


this bit will be set.

E-bit: indicates that the originating router can receive AS external


LSAs. This field is set to 1 in all Type 5 LSAs and LSAs that are
sent from the backbone area and NSSA areas. This field is set to
0 in LSAs that are sent from stub areas. This field in a Hello
packet indicates that the interface can receive and send Type 5
LSAs.

MT-bit: indicates that the originating router supports MOSPF.

Neighbor status:

Down: It is the initial stage of setting up sessions between


neighbors. In this state, a router receives no message from its
neighbor.

Init: A router has received Hello packets from its neighbor but is
not in the neighbor list of the received Hello packets. The router
has not established bidirectional communication with its neighbor.
In this state, the neighbor is in the neighbor list of Hello packets.

2-Way: In this state, bidirectional communication has been


established but the router has not established the adjacency
relationship with the neighbor. This is the highest state before the
adjacency relationship is established. When routers are located
on a broadcast or NBMA network, the routers elect the DR/BDR.

When the neighbor relationship is established, routers negotiate


parameters carrying in Hello packets.

If the network type of the interface receiving Hello packets is


P2MP or NBMA, the Network Mask field in Hello packets must
be the same as the network mask of the interface receiving the
Hello packets. If the network type of the interface is P2P or virtual
link, the Network Mask field is not checked.

The HelloInterval and RouterDeadInterval fields in a Hello


packet must be the same as those on the interface receiving the
Hello packet.

The Authentication field in a Hello packet must be the same as


that on the interface receiving the Hello packet.

The E-bit option in a Hello packet must be the same as that on


the interface receiving in the area configuration.

The Area ID field in a Hello packet must be the same as that on


the interface receiving the Hello packet.

Neighbor relationship setup:

When the neighbor state machine is ExStart on R1, R1 sends the


first DD packet to R2. Assume that in fields in this DD packet are
set as follows:
DD Sequence Number is set to 552A.
I-bit is set to 1, indicating that the DD packet is the first DD packet.
M-bit is set to 1, indicating that more DD packets are to be sent.
MS-bit is set to 1, indicating that R1 advertises itself as the
master router.

When the neighbor state machine is ExStart on R2, R2 sends the


first DD packet in which DD Sequence Number is set to 5528 to
R1. The router ID of R2 is larger than that of R1; therefore, R2
functions as the master router. After the comparison of router IDs
is complete, R1 generates a NegotiationDone event and changes
its neighbor state machine from ExStart to Exchange.

When the neighbor state machine is Exchange on R1, R1 sends a


new DD packet containing the local LSDB. In the DD packet, DD
Sequence Number is set to the sequence number of the DD
packet sent by R2, M-bit is set to 0 indicating no other DD packet
is required for describing the local LSDB, and MS-bit is set to 0
indicating that R1 advertises itself as the slave router. After
receiving the DD packet, R2 generates a NegotiationDone event
and changes its neighbor state machine to Exchange.

When the neighbor state machine is Exchange on R2, R2 sends a


new DD packet containing the local LSDB. In this DD packet, DD
Sequence Number is increased by 1 (5528 + 1 = 5529).

R1 as the slave router needs to acknowledge each DD packet


from R2
even through R1 does not need to update its LSDB using new DD
packets. R1 sends an empty DD packet with DD Sequence
Number of 5529.
When the neighbor state machine is Loading on R1, R1 sends a
Link State Request (LSR) packet to request link state information
that is learned from DD packets when the neighbor state machine
is Exchange but not contained in the local LSDB.
After receiving the LSR packet, R2 sends a Link State Update
(LSU) packet containing detailed link state information to R1.
When receiving the LSU packet, R1 changes its neighbor state
machine from Loading to Full.
R1 then sends a Link State Acknowledgement (LSAck) packet to
R2 to ensure information transmission reliability. LSAck packets
are flooded to acknowledge the receiving of LSAs.

OSPF can define areas as stub and totally stub areas. A stub area is a
special area where ABRs do not flood the received AS external routes.
The ABR in a stub area maintains fewer routing entries and transmits
less routing information. The stub area is an optional configuration, but
not all areas can be configured as stub areas. Generally, a stub area is
a non-backbone area with only one ABR and is located at the AS
boundary. To ensure the reachability of AS external routes, the ABR in
a stub area generates a Type 3 LSA carrying a default route and
advertises it within the entire stub area.

Stub area

The backbone area cannot be configured as a stub area.

If an area needs to be configured as a stub area, all the routers in


this area must be configured with stub attributes.

An ASBR cannot exist in a stub area. That is, AS external routes


are not flooded in the stub area.

A virtual link cannot pass through a stub area.

Type 5 LSAs cannot be advertised within a stub area.

A router in the stub area must learn AS external routes from the
ABR. The ABR automatically generates a Type 3 LSA carrying a
default route and advertises it within the entire stub area. The
router can then learn the AS external network from the ABR.
Totally stub area

Neither Type 3 nor Type 5 LSAs can be advertised within a totally


stub area.

A router in the totally stub area must learn AS external and interarea network from an ABR.
The ABR automatically generates a Type 3 LSA and advertises it
within the entire totally stub area.

To prevent a large number of external routes from consuming the


bandwidth and storage resources of routers in a stub area, OSPF
defines that stub areas cannot import external routes. However, stub
areas cannot meet the requirements of the scenario that requires the
import of external routes while preventing resources from being
consumed by external routes. Therefore, NSSA areas are introduced.
Type 7 LSA

Type 7 LSAs are defined in an NSSA Area to describe AS


external routes.

Type 7 LSAs are generated by an ASBR in an NSSA area and


advertised only within the NSSA area of this ASBR.

When receiving Type 7 LSAs, an ABR in an NSSA selectively


translates the Type 7 LSAs to Type 5 LSAs so that external
routes can be advertised in other areas of the OSPF network.

Type 7 LSAs can be used to carry default route information to


guide traffic to other ASs.

To advertise the external routes imported by an NSSA area to other


areas, ABRs in the NSSA area needs to translate Type 7 LSAs to Type
5 LSAs so that the external routes can be advertised on the entire
OSPF network.

The P-bit informs routers whether Type 7 LSAs need to be


translated.

The ABR with the largest router ID in an NSSA area translates


Type 7 LSAs to Type 5 LSAs.

Only when the P-bit is set and Forwarding Address is not 0, a


Type 7 LSA can be translated to a Type 5 LSA. Forwarding
Address figure out the destination address inside the ospf
domain for the external routes.

The default Type 7 LSAs meeting the preceding conditions can


also be translated.
The Type 7 LSAs generated by ABRs are not set with the P-bit.

Precautions

Multiple ABRs may be deployed in an NSSA area. To prevent


routing loops, ABRs do not calculate the default routes advertised
by each other.

NSSA and totally NSSA

A small number of AS external routes learned from the ASBR in an


NSSA area can be imported to the NSSA area. Type 5 LSAs
cannot be advertised within the NSSA area, but routers can learn
the AS external routes from the ASBR.

Neither Type 3 nor Type 5 LSAs can be advertised within a totally


NSSA.

Fast convergence

I-SPF improves this algorithm. With exception to where


calculation is performed for the first time, only changed nodes, as
opposed to all nodes, are involved in calculation. The SPT
ultimately generated is the same as that generated by the
previous algorithm. This decreases the CPU usage and speeds
up network convergence.

Similar to I-SPF, PRC calculates only the changed routes. PRC,


however, does not calculate the shortest path. PRC updates
routes based on the SPT calculated by I-SPF. In route calculation,
a leaf represents a route, and a node represents a router. A
change in the SPT or leaf causes a change in routing information,
but changes in the SPT or leaf and routing information are not
dependent on each other. PRC processes routing information
based on the SPT or leaf changes:

When the SPT is changed, the PRC processes routing


information on all leaves of the changed nodes.

When the SPT is not changed, PRC does not process


routing information on nodes.

When a leaf is changed, PRC processes routing


information on the changed leaf.

When the leaf is not changed, PRC does not process


routing information on the leaf.

The OSPF intelligent timer controls the route calculation, LSA


generation, and receiving of LSAs to speed up network
convergence. The OSPF intelligent timer speeds up network
convergence in the following modes:

On a network where routes are frequently calculated, the


OSPF intelligent timer dynamically adjusts the interval for
calculating
routes based on the user configuration and exponential
backoff technology. In this manner, the route calculation and
CPU resource consumption are decreased. Routes are
calculated after the network topology becomes stable.
On an unstable network, if a router generates or receives
LSAs due to frequent topology changes, the OSPF
intelligent timer can dynamically adjust the interval for
calculating routes. No LSA is generated or handled within
an interval, which prevents invalid LSAs from being
generated and advertised on the entire network.
The OSPF intelligent timer helps calculate routes as follows:
Based on the local LSDB, a router that runs OSPF
calculates the SPT with itself as the root using the
SPF algorithm, and determines the next hop to the
destination network according to the SPT. Changing
the interval for SPF calculation can prevent the
bandwidth and resource consumption caused by
frequent LSDB changes.
On a network that requires short route convergence
time, specify the interval for route calculation in
milliseconds to increase the route calculation
frequency and speed up route convergence.
When the OSPF LSDB changes, the shortest path
needs to be recalculated. If a network changes
frequently and the shortest path is calculated
continually, a large number of system resources will
be consumed, affecting router performance. You can
configure an intelligent timer and set a proper interval
for SPF calculation to prevent memory and bandwidth
resources from being consumed.
After the OSPF intelligent timer is used:
The initial interval for SPF calculation is
specified by the parameter start-interval.
The interval for SPF calculation for the nth (n
is larger than or equal to 2) time is equal to
hold-interval x 2 x (n 1).
When the interval specified by hold-interval x
2 x (n 1) reaches the maximum interval
specified by max-interval, OSPF performs
SPF calculation at the maximum interval for
three consecutive times. Then perform step 1
again for SPF calculation at the initial interval
specified by start-interval.

Priority-based convergence

Filter routes based on the IP prefix list. Set different priorities for
the routes so that routes with the highest priority are preferentially
converged, improving network reliability.

Setting the maximum number of non-default external routes on a router


can prevent an OSPF database overflow. You must set the same
maximum number of non-default routes for all routers on an OSPF
network. If the number of external routes on a router reaches the
configured maximum number, the router enters the overflow state and
starts the overflow timer. The router automatically leaves the overflow
state after the overflow timer expires. The default timeout period is 5
seconds.
The OSPF database overflow process is as follows:

When entering the overflow state, a router deletes all non-default


external routes that are generated by itself.

When staying in the overflow state, the router does not generate
non-default external routes, discards newly received, non-default
routes, and does not reply with an LSAck packet. When the
overflow timer expires, the router checks whether the number of
external routes still exceeds the maximum value. If so, restart the
timer; if not, the router leaves the overflow state.

When leaving the overflow state, the router deletes the overflow
timer, generates non-default external routes, receives new nondefault external routes, replies with LSAck packets, and gets
ready to enter the overflow state again.

During OSPF deployment, all non-backbone areas must be connected


to the backbone area to ensure that all areas are reachable.

Two ABRs use a virtual link to directly transmit OSPF packets. The
routers between the two ABRs only forward packets. Because the
destination of OSPF packets is not these routers, the routers
transparently forward the OSPF packets as common IP packets.
If a virtual link is not properly deployed, a loop may occur.

When the two authentication types exist, use authentication based on


interfaces.

The OSPF default route is generally applied to the following scenarios:

An ABR in an area advertises Type 3 LSAs carrying the default


route within the area. Routers in the area use the received default
route to forward inter-area packets.

An ASBR in an area advertises Type 5 or Type 7 LSAs carrying


the default route within the AS. Routers in the AS use the
received default route to forward AS external packets.
Precautions

When no exactly matched route is discovered, a router can


forward packets through the default route. Due to hierarchical
management of OSPF routes, the priority of default Type 3 routes
is higher than the priority of default Type 5 or Type 7 routes.

If an OSPF router has advertised LSAs carrying a default route,


the router does not learn this type of LSA advertised by other
routers, which carry a default route. That is, the router uses only
the LSAs advertised by itself to calculate routes. The LSAs
advertised by others are still saved in the LSDB.

If a router has to use a route to advertise LSAs carrying an


external default route, the route cannot be a route learned by the
local OSPF process. This is because a router in an area uses
default external routes to forward packets outside the area,
whereas the routes in the AS have the next hop pointing to
devices within the AS.
Principles for advertising default routes in different areas

Common area

By default, OSPF routers in a common OSPF area do not


automatically generate default routes, even if the common
OSPF area has default routes.
NSSA area
To advertise AS external routes using the ASBR in an
NSSA area and advertise other external routes
through other areas, configure a default Type 7 LSA
on the ABR and advertise this LSA in the entire
NSSA area. In this way, a small number of AS
external routes can be learned from the ASBR in the
NSSA, and other inter-area routes can be learned
from the ABR in the NSSA area.
To advertise all the external routes using the ASBR in
the NSSA area, configure a default Type 7 LSA on
the ASBR and advertise this LSA in the entire NSSA
area. In this way, all the external routes are
advertised using the ASBR in the NSSA area.
The preceding configurations are performed using the
same command in different views. The difference
between these two configurations is described as
follows:
An ABR will generate a default Type 7 LSA
regardless of whether the routing table contains the
default route 0.0.0.0.
An ASBR will generate a default Type 7 LSA only
when the routing table contains the default route
0.0.0.0.
An ABR does not translate Type 7 LSAs carrying a
default route into Type 5 LSAs carrying a default
route or flood them to the entire AS.
Totally NSSA area
All routers in the totally NSSA area must learn AS
external routes from the ABR.

Route filtering
LSAs are not filtered during route learning. Route filtering can
only determine whether calculated routes are added to the
routing table. The learned LSAs are complete.

Precautions
Stub areas and database overflow can also implement the
LSA filtering function.

This figure shows the process of establishing the neighbor relationship


and process of neighbor status changes.

Down: It is the initial stage of setting up sessions between


neighbors. In this state, a router receives no message from its
neighbor. On an NBMA network, the router can still send Hello
packets to the neighbor with static configurations. PollInterval
specifies the interval for sending Hello packets and its value is
usually the same as the value of RouterDeadInterval.

Attempt: This state exists only on the NBMA network and


indicates that the router receives no message from the neighbor.
In this state, the router periodically sends packets to the neighbor
at an interval of HelloInterval. If the router receives no Hello
packets from the neighbor within RouterDeadInterval, the state
changes to Down.

Init: A router has received Hello packets from its neighbor but is
not in the neighbor list of the received Hello packets. The router
has not established bidirectional communication with its neighbor.
In this state, the neighbor is in the neighbor list of Hello packets.

2-WayReceived: A router knows that bidirectional communication


with the neighbor has started, that is, the router is in the neighbor
list of Hello packets received from the neighbor. If the router
needs to establish the adjacency relationship with the neighbor,
the router enters the ExStart state and starts database
synchronization. If the router fails to establish the adjacency
relationship with the neighbor, the router enters the 2-Way state.

2-Way: In this state, bidirectional communication has been


established but the router has not established the adjacency
relationship with the neighbor.
This is the highest state before the adjacency relationship is established.

1-WayReceived: The router knows that it is not in the neighbor list


of Hello packets received from the neighbor. This is caused by the
restart of the neighbor.

The state machines in the figure are described as follows:

ExStart: This is the first step for establishing the adjacency


relationship. In this state, the router starts to send DD packets to
the neighbor. The two neighbors start to negotiate the
master/slave status and determine the sequence numbers of DD
packets. DD packets transmitted in this state do not contain the
local LSDB.

Exchange: The router exchanges DD packets containing the local


LSDB with its neighbor.

Loading: The router exchanges LSR packets with the neighbor for
requesting LSAs and exchanges LSU packets for advertising
LSAs.

Full: The local LSDBs on the two routers have been synchronized.

OSPF supports P2P, P2MP, NBMA, and multicast networks. IS-IS


supports only P2P and broadcast networks.
OSPF works only at the network layer and the protocol number is
89.

When an OSPF neighbor relationship is established, the two


routers check the mask, authentication mode, Hello/dead interval,
and area ID in Hello packets. The conditions for establishing an
IS-IS neighbor relationship are relatively loose.
Establishing a neighbor relationship over an OSPF P2P link
requires a three-way handshake. Establishing an IS-IS neighbor
relationship does require a three-way handshake. Huawei devices
are enabled with the three-way handshake function on an IS-IS
P2P network by default, which ensuring reliability for establishing
the neighbor relationship.
An IS-IS neighbor relationship has level 1 and level 2.
The election of an OSPF DR/BDR is based on the priority and IP
address. The elected DR/BDR cannot be preempted. On an
OSPF network, all DRothers establish full adjacency relationships
with DRs/BDRs, and establish 2-way adjacency relationships with
each other. When the priority of a router on the OSPF network is
0, the router does not participate in the DR/BDR election.
The election of an IS-IS DIS is based on the priority and MAC
address. The elected DIS can be preempted. On an IS-IS network,
all routers establish adjacency relationships with each other. If the
priority of a router on the IS-IS network is 0, the router can still
participate in the DIS election and just has a lower priority.

IS-IS supports a few type of LSPs but provides good extension


capabilities through the TLV field contained in LSPs.

OSPF costs are calculated based on bandwidth. IS-IS


supports the default cost, delay cost, overhead cost, and error
cost. IS-IS uses the default cost for implementation.

Case Description
The NBMA network topology is displayed in this case. Other
devices are connected based on the following rules:
If RX is interconnected with RY, their interconnection
addresses are XY.1.1.X and XY.1.1.Y respectively
network mask is 24.

Command Usage

The peer command sets the IP address and DR priority of the


neighboring router on an NBMA network. On an NBMA network, a
router cannot discover neighboring routers by broadcasting Hello
packets. You must manually specify IP addresses and DR
priorities of neighboring routers.
View

OSPF view
Parameters

peer ip-address [ dr-priority priority ]


ip-address: specifies the IP address for a neighboring
router.
dr-priority priority: specifies the priority for the neighbor
to select a DR.

Precautions

In the routing table on R3, the routing entry mapping the IP


address 12.1.1.2/32 exits. This is caused by the PPP echo
function. When this function is disabled, the routing entry mapping
this 32-bit IP address does not exist.

Case Description
The network topology in this case is the same as the previous
topology. Area 3 is not directly connected to Area 0, and
therefore cannot communicate with other areas.

Command Usage

The vlink-peer command creates and configures a virtual link.


View

OSPF area view

Parameters

vlink-peer router-id
router-id: specifies the router ID of the virtual link
neighbor.
Configuration Verification

Run the display ospf vlink command to view information about


the OSPF virtual link.
Remarks

A virtual link needs to be configured for R4.

Case Description
The network topology in this case is the same as the previous
topology. Company A requires control on the DR. To meet this
requirement, change the DR priorities of routers. The DR/BDR
cannot be preempted.

Command Usage

The ospf dr-priority command sets the priority of an interface


that participates in the DR election.
View

Interface view
Parameters

ospf dr-priority priority


priority: specifies the priority of an interface that
participates in the DR/BDR election. A larger value
indicates a higher priority.
Precautions
If the DR priority of an interface on a router is 0, the router
cannot be elected as a DR or a BDR. In OSPF, the DR
priority cannot be configured for null interfaces. Note that
the DR/BDR cannot be preempted even if the DR priority is
changed.

Configuration Verification

Run the display ospf peer command to view information about


neighbors in OSPF areas.

Case Description
The network topology in this case is the same as the previous
topology. This is the network extension requirement. On an
OSPF FR network, the default interval for sending Hello
packets is 30 seconds, and the default interval for sending is
120 seconds. When the neighbor relationship is invalid, the
interval for sending Hello packets is 120 seconds.

Command Usage

The ospf timer hello command sets the interval for sending Hello
packets on an interface.

The ospf timer poll command sets the poll interval for sending
Hello packets on an NBMA network.
View

ospf timer hello: interface view

ospf timer poll: interface view


Parameters

ospf timer hello interval


interval: specifies the interval for sending Hello packets
on an interface.

ospf timer poll interval


interval: specifies the poll interval for sending Hello
packets.
Precautions
By default, the intervals for sending Hello packets are 10
seconds on P2P and broadcast interfaces and 30 seconds
on P2MP and NBMA interfaces respectively. Ensure that
parameters are set to the same on the local interface and
the remote interface of the neighboring router.

Remarks

On an NBMA network, after the neighbor relationship is


invalid, the router sends Hello packets periodically at the
interval specified using the ospf timer poll command. The
poll interval must be at least four times of the interval for
sending Hello packets.

Perform the same interface configuration on R4 as that on


R2 and R3.

Case Description
This case is an extension to the original case. Perform
configurations on the basis of the original case. Imported
routes are advertised in E2 mode by default, and the default
cost value is 1.

Command Usage

The import-route command imports routes learned by other


routing protocols.

The ospf cost command sets the cost of a route on an OSPFenabled interface.
View

import-route: OSPF view

ospf cost: interface view


Parameters

import-route[ cost cost | type type ]


cost cost: specifies the cost of a route.
type type: specifies the cost type.

ospf cost cost


cost: specifies the cost of an OSPF-enabled interface.
Precautions

On a non-PE device, only EBGP routes are imported after the


import-route bgp command is configured. IBGP routes are also
imported after the import-route bgp permit-ibgp command is
configured. If IBGP routes are imported, routing loops may occur.
In this case, run the preference (OSPF) and preference (BGP)
commands to set the priority of OSPF ASE routes to lower than
that of IBGP routes.

Case Description
This case is an extension to the original case. Perform
configuration on the basis of the original case. If R6 does not
want to receive routes from network 172.16.X.0/24, filter Type
3 LSAs on R5.

Command Usage
The filter-policy export command configures a filtering policy
to filter the imported routes when these routes are advertised
in Type 5 LSAs within the AS. This command can be
configured only on an ASBR to filter Type 5 LSAs.
The filter-policy import command configures a filtering policy
to filter intra-area, inter-area, and AS external routes received
by OSPF. On routers within an area, this command can be
used to filter only routes; on an ABR, this command can be
used to filter Type 3 LSAs.
View
filter-policy export: OSPF view
filter-policy import: OSPF view
Parameters
filter-policy { acl-number | acl-name acl-name | ip-prefix ipprefix-name } export [ protocol [ process-id ] ]
acl-number: specifies the basic ACL number.
acl-name acl-name: specifies the ACL name.
ip-prefix ip-prefix-name: specifies the name of an IP
prefix list.
protocol: specifies the protocol for advertising routing
information.
process-id: specifies the process ID when RIP, IS-IS, or
OSPF is used for advertising routing information.

filter-policy { acl-number | acl-name acl-name | ip-prefix ipprefix-name } import


acl-number: specifies the basic ACL number.
acl-name acl-name: specifies the ACL name.
ip-prefix ip-prefix-name: specifies the name of an IP
prefix list.
Precautions
Type 5 LSAs are generated on an ASBR to describe AS
external routes and advertised to all areas (excluding stub and
NSSA areas). The filter-policy command needs to be
configured on an ASBR. To advertise only routing information
meeting specific conditions, run the filter-policy command to
set filtering conditions.

Case Description
This case is an extension to the original case. Perform
configuration on the basis of the original case. Configure Area
1 as an NSSA area.

Command Usage

The nssa command configures an OSPF area as an NSSA area.


View

OSPF area view


Parameters

nssa [ default-route-advertise | flush-waiting-timer intervalvalue | no-import-route | no-summary | set-n-bit |suppressforwarding-address | translator-always | translatorinterval interval-value | zero-address-forwarding ] *
default-route-advertise: generates default Type 7 LSAs
on an ABR or ASBR and then advertises them to the
NSSA area.
flush-waiting-timer interval-value: specifies the interval
for an ASBR to send aged Type 5 LSAs. This parameter
takes effect for once only.
no-import-route: indicates that no external route is
imported to the NSSA area.
no-summary: indicates that an ABR is prohibited from
sending Type 3 LSAs to the NSSA area.
set-n-bit: sets the N-bit in DD packets.
suppress-forwarding-address: sets the FA of the Type
5 LSAs translated from Type 7 LSAs by the NSSA ABR
to 0.0.0.0.

translator-always: specifies an ABR in an NSSA area as


an all-the-time translator. Multiple ABRs in an NSSA area
can be configured as translators.
translator-interval interval-value: specifies the timeout
period of a translator.
zero-address-forwarding: sets the FA of the generated
NSSA LSAs to 0.0.0.0 when external routes are imported
from an ABR in an NSSA area.

Precautions

The parameter default-route-advertise is configured to advertise


Type 7 LSAs carrying the default route. Regardless of the route
0.0.0.0 exists in the routing table, Type 7 LSAs carrying the default
route will be generated on an ABR. However, Type 7 LSAs
carrying the default route will be generated only when the route
0.0.0.0 exists in the routing table on an ASBR.

When the area to which the ASBR belongs is configured as an


NSSA area, invalid Type 5 LSAs from other routers in the area
where LSAs are flooded will be reserved. These LSAs will be
deleted only when the aging time reaches 3600 seconds. The
router performance is affected because the forwarding of a large
number of LSAs consumes the memory resources. The parameter
flush-waiting-timer is configured to generate Type 5 LSAs with
the aging time of 3600 seconds. Invalid Type 5 LSAs on other
routers are therefore cleared in a timely manner.

The parameter flush-waiting-timer does not take effect when the


ASBR also functions as an ABR. In this way, Type 5 LSAs in nonNSSA areas will not be deleted.

Case Description
This case is an extension to the original case. Perform
configuration on the basis of the original case. Note that the
virtual link belongs to Area 0.

Command Usage

The authentication-mode command sets the authentication


mode and password for an OSPF area. After this command is
executed, interfaces on all routers in an OSPF area use the same
authentication mode and password.
View

OSPF view
Parameters

authentication-mode { md5 | hmac-md5 } [ keyid { plain plaintext | [ cipher ] ciphertext } ]


md5 password-key: indicates the MD5 authentication
using the ciphertext password.
hmac-md5: indicates HMAC-MD5 authentication using
the ciphertext password.
key-id: specifies an authentication ID, which must be the
same on the two ends.
keychain: indicates keychain authentication.
keychain-name: specifies the keychain name.

authenticationmode simple [ [ plain ] plaintext | cipher ciphertext ]


simple password: indicates simple authentication.
plain: indicates authentication using the plaintext
password. If this parameter is specified, the device
allows you to set only a plaintext key, and the key is
displayed in plaintext mode in the configuration file.

plaintext: specifies a plaintext password.


cipher: specifies a ciphertext password. If this parameter
is specified, the device allows you to set only a ciphertext
key, and the key is displayed in ciphertext mode in the
configuration file.
ciphertext: specifies a ciphertext password.
Precautions

The authentication modes and passwords of all the devices must


be the same in an area, but can be different in different areas.

The authentication-mode command used in the interface view


takes precedence over the authentication-mode command used
in the OSPF area view.

Case Description
If RX is interconnected with RY, their interconnection
addresses are XY.1.1.X/24 and XY.1.1.Y/24 respectively.

Configuration Verification
Run the display ospf peer brief command to check whether
the neighbor relationship is established.

Configuration Verification
Run the tracert command to trace traffic on R3. The command
output shows that traffic on R3 reaches S0/0/0 on R1 through
the Ethernet link.

Configuration Verification
Run the display ip routing-table command to view the routing
table. During the route summarization, original tags are
removed. Therefore, tags need to be added in the next route
summarization.

Case Description
The network runs OSPF.

Analysis
To make R1 select the path through area 2 to reach the
networks in area 1,we must make the path through area2 work
as it is passing through area 0.virtual link meet the
needs.when virtual link is established,R1 will compare the cost
of the two path and choose the path with lower cost as the
best.

Configuration Verification
Only the external LSA (10.0.0.0) exists in the LSDB on R2.

Configuration Verification
All neighbor relationships on R3 are correct, indicating
successful authentication.

BGP is a dynamic routing protocol used between ASs. BGP-1 (defined


in RFC 1105), BGP-2 (defined in RFC 1163), and BGP-3 (defined in
RFC 1267) are three earlier-released BGP versions. BGP exchanges
reachable inter-AS routes, establishes inter-AS paths, avoids routing
loops, and applies routing policies between ASs. The current BGP
version is BGP-4 defined in RFC 4271.
As an external routing protocol on the Internet, BGP is widely used
among Internet Service Providers (ISPs).
BGP has the following characteristics:
BGP is an EGP. Different from Interior Gateway Protocols
(IGPs) such as Open Shortest Path First (OSPF) and Routing
Information Protocol (RIP), BGP controls route advertisement
and selects optimal routes between ASs rather than discover
or calculate routes.
BGP uses the Transport Control Protocol (TCP) with listening
port 179 as the transport layer protocol. TCP enhances BGP
reliability with requiring a dedicated mechanism to ensure
connectivity.
BGP needs to select inter-AS routes, which requires
high protocol stability. TCP with high reliability
therefore is used to enhance BGP stability.
BGP peers must be logically connected and establish
TCP connections. The destination port number is 179,
and the local port number is random.

When routes are updated, BGP transmits only the updated


routes. This greatly reduces the bandwidth occupied by BGP
route advertisements. Therefore, BGP applies to the
transmission of a large number of routes on the Internet.
BGP is designed to avoid loops.
Inter-AS: BGP routes carry information about the ASs
along the path. The routes that carry the local AS
number are discarded to avoid inter-AS loops.
Intra-AS: BGP does not advertise the routes learned in
an AS to BGP peers in the AS. In this manner, intra-AS
loops are avoided.
BGP provides rich routing policies to flexibly filter and select
routes.
BGP provides a route flapping prevention mechanism, which
effectively improves Internet stability.
BGP is easy to extend and adapts to network development. It
is mainly extended using TLVs.

An AS is a group of routers that are managed by a single technical


administration and use the same routing policy.
An AS is a group of routers that are managed by a single technical
administration and use the same routing policy.
Each AS has a unique AS number, which is assigned by the
Internet Assigned Numbers Authority (IANA).
An AS number ranges from 1 to 65535. Values 1 to 64511 are
registered Internet numbers, while values 64512 to 65535 are
private AS numbers.
Each AS on a BGP network is assigned a unique AS number to
identify the AS. Currently, 2-byte AS and 4-byte AS numbers are
available. A 2-byte AS number ranges from 1 to 65535, while a 4byte AS number ranges from 1 to 4294967295. Devices supporting
4-byte AS numbers are compatible with devices supporting 2-byte
AS numbers.

EBGP and IBGP


IBGP: runs within an AS. To prevent routing loops within an AS, a
BGP device does not advertise the routes learned from an IBGP
peer to other IBGP peers, and establishes full-mesh connections
with all the IBGP peers.
EBGP: runs between ASs. To prevent routing loops between ASs, a
BGP device discards routes containing the local AS number when
receiving routes from EBGP peers.
Device roles in BGP message exchange
Speaker: The device that sends BGP messages is called a BGP
speaker. The speaker receives and generates new routes, and
advertises the routes to other BGP speakers.
Peer: The speakers that exchange messages with each other are
called BGP peers. A group of peers sharing the same policies can
form a peer group.

BGP peers exchange five types of messages: Open, Update, Keepalive,


Notification, and Route-Refresh messages.
Open message: is used to establish BGP peer relationships. It is
the first message sent after a TCP connection is set up. After a
BGP peer receives an Open message and the peer negotiation
succeeds, the BGP peer sends a Keepalive message to confirm
and maintain the peer relationship. Subsequently, BGP peers can
exchange Update, Notification, Keepalive, and Route-refresh
messages.
Update message: is used to exchange routes between BGP peers.
Update messages can be used to advertise multiple reachable
routes with the same attributes or to withdraw multiple unreachable
routes.
An Update message can be used to advertise multiple
reachable routes with the same attributes. These
routes can share a group of route attributes. The route
attributes in an Update message apply to all the
destination addresses (expressed by IP prefixes) in the
Network Layer Reachability Information (NLRI) field of
the Update message.
An Update message can be used to withdraw multiple
unreachable routes. Each route is identified by its
destination address (expressed by an IP prefix), which
identifies the routes previously advertised between
BGP speakers.

An Update message can be used only to withdraw


routes. In this case, it does not need to carry route
attributes or NLRI. Similarly, an Update message can
be used only to advertise reachable routes, so it does
not need to carry information about withdrawn routes.
Keepalive message: is periodically sent to the BGP peer to
maintain the peer relationship.
Notification message: is sent to the BGP peer when an error is
detected. The BGP connection is then terminated immediately.
Route-Refresh message: is used to request the BGP peer resend
routes when the BGP inbound routing policy changes. If all BGP
routers have the Route-Refresh capability, the local BGP router
sends a Route-Refresh message to BGP peers when the BGP
inbound routing policy changes. After receiving the Route-Refresh
message, the BGP peers resend their routing information to the
local BGP router. In this manner, the BGP routing table can be
dynamically updated, and the new routing policy can be used
without terminating BGP connections. A BGP peer notifies its peer
of its Route-Refresh capability by sending an Open message.
BGP message applications
BGP uses TCP port 179 to set up a connection. BGP connection
setup requires a series of dialogues and handshakes. TCP
advertises parameters such as the BGP version, BGP connection
holdtime, local router ID, and authorization information in an Open
message during handshake negotiation.
After a BGP connection is set up, a BGP router sends the BGP
peer an Update message that carries the attributes of a route to be
advertised. This helps the BGP peer select the optimal route. When
local BGP routes change, a BGP router sends an Update message
to notify the BGP peer of the changes.
After two BGP peers exchange routes for a period of time, they do
not have new routes to be advertised and need to periodically send
Keepalive messages to maintain the validity of the BGP connection.
If the local BGP router does not receive any BGP message from the
BGP peer within the holdtime, the local BGP router considers that
the BGP connection has been terminated, tears down the BGP
connection, and deletes all the BGP routes learned from the peer.
When the local BGP router detects an error during the operation, for
example, it does not support the peer BGP version or receives an
invalid Update message, it sends the BGP peer a Notification
message to report the error. Before terminating a BGP connection
with the peer, the local BGP router also needs to send a Notification
message to the peer.
BGP message header
Marker: A 16-byte field fixed to a value of 1.

Length: A 2-byte unsigned integer that indicates the total length of a


message, including the header.
Type: A 1-byte field that specifies the type of a message:
Open
Update
Keepalive
Notification
Route-Refresh

Open message format


Version: Indicates the BGP version number. For BGPv4, the value
is 4.
My Autonomous System: Indicates the local AS number.
Comparing the AS numbers on both ends, you can determine
whether a BGP connection is an IBGP or EBGP connection.
Hold Time: Indicates the time during which two BGP peers maintain
a BGP connection between them. During the peer relationship
setup, two BGP peers need to negotiate the holdtime and keep the
holdtime consistent. If two BGP peers have different holdtime
periods configured, the shorter holdtime is used. If the local BGP
router does not receive a Keepalive message from the peer within
the holdtime, it considers that the BGP connection is terminated. If
the holdtime is 0, no Keepalive message is sent.
BGP Identifier: Indicates the router ID of a BGP router. It is
expressed as an IP address to identify a BGP router.
Opt Parm Len (Optional Parameters Length): Indicates the optional
parameter length. The value 0 indicates that no optional parameters
are available.
Optional Parameters: These are used for BGP authentication or
Multiprotocol Extensions. Each parameter is a 3-tuple (Parameter
Type-Parameter Length-Parameter Value).
Update message format
Withdrawn Routes Length: A 2-byte unsigned integer that indicates
the total length of the Withdrawn Routes field. The value 0 indicates
that the Withdrawn Routes field is not present in this Update
message.
Withdrawn Routes: A variable-length field that contains a list of IP
address prefixes for the routes to be withdrawn. Each IP address
prefix is in <length, prefix> format. For example, <19,198.18.160.0>
indicates a network at 198.18.160.0 255.255.224.0.
Path Attribute Length: A 2-byte unsigned integer that indicates the
total length of the Path Attribute field. The value 0 indicates that the
Path Attribute field is not present in an Update message.
Network Layer Reachability Information: Contains a list of IP
address prefixes. This variable length field is in the same format as
the Withdrawn Routes: <length, prefix>.

Keepalive message format


A Keepalive message has only the message header.
By default, the interval for sending Keepalive messages is 60
seconds, and the holdtime is 180 seconds. Each time a BGP router
receives a Keepalive message from its peer, it resets the hold timer.
If the hold timer expires, it considers the peer to be 'down'.
Notification message format
Errorcode: A 1-byte field that uniquely identifies an error. Each error
code may have one or more error subcodes. If no error subcode is
defined for an error code, the Error Subcode Field is all 0s.
Errsubcode: Indicates an error subcode.

A BGP finite state machine (FSM) has six states: Idle, Connect, Active,
OpenSent, OpenConfirm, and Established.
The Idle state is the initial BGP state. In Idle state, a BGP
device refuses all the connection requests from neighbors.
The BGP device initiates a TCP connection with its BGP peer
and changes its state to connect only after receiving a start
event from the system.
A start event occurs when an operator configures a
BGP process, resets an existing BGP process or when
the router software resets a BGP process.
If an error occurs in any FSM state, for example, the
BGP device receives a notification message or TCP
connection termination notification, the BGP device
returns to the Idle state.
In the connect state, the BGP device starts the ConnectRetry
timer and waits to establish a TCP connection. The
ConnectRetry timer defaults to 32 seconds.
If a TCP connection is established, the BGP device
sends an open message to the peer and changes to
the OpenSent state.
If a TCP connection fails to be established, the BGP
device moves to the Active state.
If the BGP device does not receive a response from the
peer before the ConnectRetry timer expires, the BGP
device attempts to establish a TCP connection with
another peer and stays in the connect state.

If another event (started by the system or operator)


occurs, the BGP device returns to the Idle state.
In the Active state, the BGP device keeps trying to establish a
TCP connection with the peer.
If a TCP connection is established, the BGP device
sends an open message to the peer, closes the
ConnectRetry timer, and changes to the OpenSent
state.
If a TCP connection fails to be established, the BGP
device stays in the Active state.
If the BGP device does not receive a response from the
peer before the ConnectRetry timer expires, the BGP
device returns to the connect state.
In the OpenSent state, the BGP device waits for an Open
message from the peer and then checks the validity of the
received Open message, including the AS number, version,
and authentication password.
If the received Open message is valid, the BGP device
sends a Keepalive message and changes to the
OpenConfirm state.
If the received Open message is invalid, the BGP
device sends a Notification message to the peer and
returns to the Idle state.
In OpenConfirm state, the BGP device waits for a Keepalive or
Notification message from the peer. If the BGP device receives
a Keepalive message, it transitions to the Established state. If
it receives a Notification message, it returns to the Idle state.
In Established state, the BGP device exchanges Update,
Keepalive, Route-Refresh, and Notification messages with the
peer.
If the BGP device receives a valid Update or Keepalive
message, it considers that the peer is working properly
and maintains the BGP connection with the peer.
If the BGP device receives a valid Update or Keepalive
message, it sends a Notification message to the peer
and returns to the Idle state.
If the BGP device receives a Route-refresh message, it
does not change its state.
If the BGP device receives a Notification message, it
returns to the Idle state.
If the BGP device receives a TCP connection
termination notification, it terminates the TCP
connection with the peer and returns to the Idle state.

A BGP device adds optimal routes to the BGP routing table to generate
BGP routes. After establishing a BGP peer relationship with a neighbor,
the BGP device follows the following rules to exchange routes with the
peer:

Advertises the BGP routes received from IBGP peers


only to its EBGP peers.

Advertises the BGP routes received from EBGP peers


to all its EBGP peers and IBGP peers.

Advertises the optimal route to its peers when there


are multiple valid routes to the same destination.

Sends only updated BGP routes when BGP routes


change.

BGP routing information processing


When receiving Update messages from peers, a BGP router
saves the Update messages to the routing information base
(RIB) and specifies the Adj-RIB-In of the peer from which the
Update messages are received. After these Update messages
are filtered by the inbound policy engine, the BGP router
determines the optimal route for each prefix according to the
route selection algorithm.
The optimal routes are saved in the local BGP RIB (Loc-RIB)
and then submitted to the local IP route selection table (IPRIB).
In addition to the optimal routes received from peers, Loc-RIB
also contains the BGP prefixes that are selected as the optimal
routes and injected by the current router (locally originated
routes). Before the routes in Loc-RIB are advertised to other
peers, these routes must be filtered by the outbound policy
engine. Only the routes that pass the filtering of the outbound
policy engine can be installed to the RIB (Adj-RIB-Out).

Synchronization is performed between IBGP and IGP to prevent


misleading routers in other ASs.
Topology description (when synchronization is enabled)
R4 learns the route to 10.0.0.0/24 advertised by R1 through
BGP and checks whether local IGP routing tables contain the
route. If so, R4 advertises the route to R5. If not, R4 does not
advertise the route to R5.
Precautions: By default synchronization is disabled on VRP
platform, and it can not be changed. Only under two
conditions,we can disable the synchronization:
The local AS is not a transit AS.
All the routers within the local AS set up full-mesh IBGP
connections.

BGP route attributes are a set of parameters that further describe BGP
routes. Using BGP route attributes, BGP can filter and select routes.
Common attributes are as follows:
Origin: A well-known mandatory attribute.
AS_Path: A well-known mandatory attribute.
Next_Hop: A well-known mandatory attribute.
Local_Pref: A well-known discretionary attribute.
Community: An optional transitive attribute.
MED: An optional non-transitive attribute.
Originator_ID: An optional non-transitive attribute.
Cluster_List: An optional non-transitive attribute.

The Origin attribute defines the origin of a route and marks the path of a
BGP route. The Origin attribute is classified into the following types:
IGP: A route with the Origin attribute IGP is an IGP route and
has the highest priority. For example, the Origin attribute of the
routes injected to the BGP routing table using the network
command is IGP.
EGP: A route with the Origin attribute EGP is an EGP route
and has the secondary highest priority.
Incomplete: A route with the Origin attribute Incomplete is
learned by other means and has the lowest priority. For
example, the Origin attribute of the routes imported by BGP
using the import-route command is Incomplete.

The AS_Path attribute records all the ASs that a route passes through
from a source to a destination in the distance-vector order. To prevent
inter-AS routing loops, a BGP device does not accept the EBGP routes
of which the AS_Path list contains the local AS number.
Assume that a BGP speaker advertises a local route:
When advertising the route to other ASs, the BGP speaker
adds the local AS number to the AS_Path list, and then
advertises it to neighboring routers in Update messages.
When advertising the route to the local AS, the BGP speaker
creates an empty AS_Path list in an Update message.
Assume that a BGP speaker advertises a route learned in the Update
message sent by another BGP speaker:
When advertising the route to other ASs, the BGP speaker
adds the local AS number to the leftmost of the AS_Path list.
According to the AS_Path attribute, the BGP router that
receives the route can determine the ASs through which the
route has passed to the destination. The number of the AS that
is nearest to the local AS is placed on the leftmost of the list,
and the other AS numbers are listed according to the
sequence in which the route passes through ASs.
When advertising the route to the local AS, the BGP speaker
does not change the AS_Path attribute of the route.

Topology description
When R4 advertises route 10.0.0.0/24 to AS 400 and AS 100,
it adds the local AS number to the AS_Path list. When R5
advertises the route to AS 100, it also adds the local AS
number to the AS_Path list. When R1 and R3 in AS 100
advertise the route to R2 in the same AS, they keep the
AS_Path attribute of the route unchanged. R2 selects the route
with the shortest AS_Path when other BGP routing rules are
the same. That is, R2 reaches 10.0.0.0/24 through R3.

The Next_Hop attribute records the next hop that a route passes
through. The Next_Hop attribute of BGP is different from that of an IGP
because it may not be the neighbor IP address. A BGP speaker
processes the Next_Hop attribute based on the following rules:
When advertising a locally originated route to an IBGP peer,
the BGP speaker sets the Next_Hop attribute of the route to be
the IP address of the local interface through which the BGP
peer relationship is established.
When advertising a route to an EBGP peer, the BGP speaker
sets the Next_Hop attribute of the route to be the IP address of
the local interface through which the BGP peer relationship is
established.
When advertising a route learned from an EBGP peer to an
IBGP peer, the BGP speaker does not change the Next_Hop
attribute of the route.

Local_Pref attribute
This attribute indicates the BGP preference of a router. It is
exchanged only between IBGP peers and not advertised to
other ASs.
This attribute helps determine the optimal route when traffic
leaves an AS. When a BGP router obtains multiple routes to
the same destination address but with different next hops from
IBGP peers, the router prefers the route with the highest
Local_Pref.
Topology description
R1,R2,R3 are IBGP Peers of each other in AS 100, R2 establish EBGP
Peer with AS 200 and R3 establish EBGP Peer with AS 300. So R2
and R3 will learn route 10.0.0.0/24 from EBGP, R1 learns two routes to
10.0.0.0/24 from two IBGP peers (R2 and R3) in the local AS. Prefers
R2 routing 10.0.0.0/24 to other ASs in AS100, it need configure the
Local_Pref with R2 and R3: one with Local_Pref value 300 from R2 and
the other with Local_Pref value 200 from R3. R1 prefers the route
learned from R2.

The MED attribute helps determine the optimal route when traffic enters
an AS. When a BGP router obtains multiple routes to the same
destination address but with different next hops from EBGP peers, the
router selects the route with the smallest MED value as the optimal
route if the other attributes of the routes are the same.
The MED attribute is exchanged only between two neighboring ASs.
The AS that receives this attribute does not advertise the attribute to
any other AS. This attribute can be manually configured. If the MED
attribute is not configured for a route, the MED attribute of the route
uses the default value 0.
Topology description
R1 and R2 advertise routes 10.0.0.0/24 to their respective
EBGP peers R3 and R4. When other routing rules are the
same, R3 and R4 prefer the route with a smaller MED value.
That is, R3 and R4 access network 10.0.0.0/24 through R1.

The Community attribute is a set of destination addresses with the


same characteristics. It is expressed as a 4-byte list and in the aa:nn or
community number format.
aa:nn: The value of aa or nn ranges from 0 to 65535. The
administrator can set a specific value as required. Generally,
aa indicates the AS number and nn indicates the community
identifier defined by the administrator. For example, if a route
is from AS 100 and its community identifier defined by the
administrator is 1, the Community attribute is 100:1.
Community number: An integer that ranges from 0 to
4294967295. As defined in RFC 1997, numbers from 0
(0x00000000) to 65535 (0x0000FFFF) and from 4294901760
(0xFFFF0000) to 4294967295 (0xFFFFFFFF) are reserved.
The Community attribute helps simplify application, maintenance, and
management of routing policies. With the community, a group of BGP
routers in multiple ASs can share the same routing policy. This attribute
is a route attribute and is transmitted between BGP peers without being
restricted by ASs. Before advertising a route with the Community
attribute to peers, a BGP router can change the original Community
attribute of this route.
Well-known community attributes
Internet: All routes belong to the Internet community by default.
A route with this attribute can be advertised to all BGP peers.

No_Advertise: A device does not advertise a received route


with the No_Advertise attribute to any peer.
No_Export: A BGP device does not advertise a received route
with the No_Export attribute to devices outside the local AS. If
a confederation is defined, the route with the No_Export
attribute cannot be advertised to ASs outside of the
confederation but to other sub-ASs in the confederation.
No_Export_Subconfed: BGP device does not advertise the
received route with the No_Export_Subconfed attribute to
devices outside the local AS or to devices outside the local
sub-AS in a confederation.

BGP routing rules


The next-hop addresses of routes must be reachable.
The PrefVal attribute is a Huawei proprietary attribute and is
valid only on the device where it is configured.
If a route does not have the Local_Pref attribute, the
Local_Pref attribute of the route uses the default value 100.
You can use the default local-preference command to
change the default local preference of BGP routes.
Locally generated routes include the routes imported using the
network or import-route command, manually summarized
routes, and automatically summarized routes.
Summarized routes have a higher priority than nonsummarized routes.
Manually summarized routes generated using the
aggregate command have a higher priority than
automatically summarized routes generated using the
summary automatic command.
Routes imported using the network command have a
higher priority than routes imported using the importroute command.
Prefers the route with the shortest AS_Path.
The AS_Path length does not include
AS_CONFED_SEQUENCE and AS_CONFED_SET.
An AS_SET counts as 1 no matter how many AS
numbers the AS_SET contains.

BGP does not compare the AS_Path attributes of


routes after the bestroute as-path-ignore command is
executed.
Prefers the route with the lowest MED.
BGP compares only the MED values of routes sent
from the same AS (excluding a confederation sub-AS).
That is, BGP compares the MED values of two routes
only when the first AS numbers in the AS_SEQUENCE
attributes (excluding the AS_CONFED_SEQUENCE)
of the two routes are the same.
If a route does not have the MED attribute, BGP
considers the MED value of the route as the default
value 0. After the bestroute med-none-as-maximum
command is executed, BGP considers the MED value
of the route as the maximum value 4294967295.
After the compare-different-as-med command is
executed, BGP compares the MEDs in the routes sent
from peers in different ASs. Do not use this command
unless different ASs use the same IGP and route
selection mode, otherwise routing loops may occur.
After the bestroute med-confederation command is
executed, BGP compares the MED values of routes
only when the AS_Path does not contain external AS
numbers (sub-ASs that do not belong to a
confederation) and the first AS number in
AS_CONFED_SEQUENCE is the same.
After the deterministic-med command is executed,
routes are not selected in the sequence in which routes
are received.

Load Balancing
When there are multiple equal-cost routes to the same
destination, you can perform load balancing among these
routes to load balance traffic.
Equal-cost BGP routes can be generated for traffic load
balancing only when the rules before the attibutes "Prefers the
route with the lowest IGP metric are the same.

BGP security
MD5: BGP uses TCP as the transport layer protocol. To
ensure BGP security, you can perform MD5 authentication
during the TCP connection setup. MD5 authentication,
however, does not authenticate BGP messages. Instead, it
sets the MD5 authentication password for a TCP connection,
and the authentication is performed by TCP. If the
authentication fails, no TCP connection is set up.
After GTSM is enabled for BGP, an interface board checks the
TTL values in all BGP messages. In actual networking,
packets whose TTL values are not within the specified range
are either allowed to pass through or discarded by GTSM. To
configure GTSM to discard packets by default, you can set a
correct TTL value range according the network topology.
Subsequently, messages whose TTL values are not within the
specified range are discarded. This function avoids attacks
from bogus BGP messages. This function is mutually
exclusive to multi-hop EBGP.
The number of routes received from peers is limited to prevent
resource exhaustion attacks.
The AS_Path lengths on the inbound and outbound interfaces
are limited. Packets that exceed the limit of the AS_Path
length are discarded.

Route dampening helps solve the problem of route instability. In most


cases, BGP is used on complex networks where route flapping occurs
frequently. To prevent frequent route flapping, BGP uses route
dampening to suppress unstable routes.

Route dampening measures the stability of a route using a penalty


value. A larger penalty value indicates a less stable route. Each time
route flapping occurs, BGP increases the penalty of a route by a value
of 1000. During route flapping, a route changes from active to inactive.
When the penalty value of the route exceeds the suppression threshold,
BGP suppresses this route and does not add it to the IP routing table or
advertise any Update message to BGP peers.
After a route is suppressed for a period of time (half life), the penalty
value is reduced by half. When the penalty value of a route decreases
to the reuse threshold, the route becomes reusable and is added to the
routing table. At the same time, BGP advertises an Update message to
peers. The penalty value, suppression threshold, and half life can be
manually configured.

Route dampening applies only to EBGP routes but not IBGP routes.
IBGP routes often include the routes from the local AS, which requires
that the forwarding tables of devices within an AS be the same. In
addition, IGP fast convergence aims to achieve information
synchronization.

If IBGP routes were dampened, forwarding tables on devices would be


inconsistent when these devices have different dampening parameters.
Route dampening therefore does not apply to IBGP routes.

Case description
IP addresses used to interconnect devices are designed as
follows:
If RTX connects to RTY, interconnected addresses are
XY.1.1.X and XY.1.1.Y.Network mask is 24.
Loopback interface addresses of R1, R2, R3, R6, and
R7 are shown in the figure.
Case analysis
To establish stable IBGP peer relationships, use loopback
interface addresses and static routes within an AS.
To establish EBGP peer relationships, use physical interface
addresses.

Command usage
The peer as-number command sets the AS number of a
specified peer (or peer group).
The peer connect-interface command specifies a source
interface that sends BGP messages and a source address
used to initiate a connection.
The peer next-hop-local command configures a BGP device
to set its IP address as the next hop of routes when it
advertises the routes to an IBGP peer or peer group.
View
BGP process view
Parameters
peer ipv4-address as-number as-number
ip-address: specifies the IPv4 address of a peer.
as-number: specifies the AS number of the peer.
peer ipv4-address connect-interface interface-type interfacenumber [ ipv4-source-address ]
ip-address: specifies the IPv4 address of a peer.
interface-type interface-number: specifies the interface
type and number.
ipv4-source-address: specifies the IPv4 source address
used to set up a connection.
peer ipv4-address next-hop-local

ip-address: specifies the IPv4 address of a peer.


Precautions
When using a loopback interface to send BGP messages:
Ensure that the loopback interface address of the BGP
peer is reachable.
In the case of an EBGP connection, you need to run
the peer ebgp-max-hop command to enable EBGP to
establish the peer relationship in indirect mode.
The peer next-hop-local and peer next-hop-invariable
commands are mutually exclusive.
The PrefRcv field in the display bgp peer command output
indicates the number of route prefixes received from the peer.

Case description
The topology in this case is the same as that in the previous
case. Perform the configuration based on the configuration in
the previous case.
R1 prefers routes to 10.0.X.0/24 with next hop R2 because
BGP prefers the route advertised by the router with the
smallest router ID.

Command usage
The peer route-policy command specifies a route-policy to
control routes received from, or to be advertised to a peer or
peer group.

View
BGP view
Parameters
peer ipv4-address route-policy route-policyname { import | export }
ipv4-address: specifies an IPv4 address of a peer.
route-policy-name: specifies a route-policy name.
import: applies a route-policy to routes to be imported
from a peer or peer group.
export: applies a route-policy to routes to be advertised
to a peer or peer group.
Configuration verification
Run the display bgp routing-table command to view the BGP
routing table.

Case description
The topology in this case is the same as that in the previous
case. Company A requires that R1 access network 10.0.1.0/24
through R7. To meet this requirement, you can enable R4 to
access network 10.0.1.0/24 through R7 using the MED
attribute.

Command usage
The peer route-policy command specifies a route-policy to
control routes received from, or to be advertised to a peer or
peer group.

View
BGP view
Parameters
peer ipv4-address route-policy route-policyname { import | export }
ipv4-address: specifies an IPv4 address of a peer.
route-policy-name: specifies a route-policy name.
import: applies a route-policy to routes to be imported
from a peer or peer group.
export: applies a route-policy to routes to be advertised
to a peer or peer group.
Configuration verification
Run the display bgp routing-table command to view the BGP
routing table.

Case description
The topology in this case is the same as that in the previous
case. To meet the requirement, use the Community attribute.

Command usage
The peer route-policy command specifies a route-policy to
control routes received from, or to be advertised to a peer or
peer group.

View
BGP view
Parameters
peer ipv4-address route-policy route-policyname { import | export }
ipv4-address: specifies an IPv4 address of a peer.
route-policy-name: specifies a route-policy name.
import: applies a route-policy to routes to be imported
from a peer or peer group.
export: applies a route-policy to routes to be advertised
to a peer or peer group.
Configuration verification
Run the display bgp routing-table community command to
view the attributes in the BGP routing table.

Case description
This case is an extension to the previous case. Perform the
configuration based on the configuration in the previous case.

Command usage
The peer route-policy command specifies a route-policy to
control routes received from, or to be advertised to a peer or
peer group.
The peer default-route-advertise command configures a
BGP device to advertise a default route to its peer or peer
group.
View
peer route-policy: BGP view
peer default-route-advertise: BGP view
Parameters
peer ipv4-address route-policy route-policyname { import | export }
ipv4-address: specifies an IPv4 address of a peer.
route-policy-name: specifies a route-policy name.
import: applies a route-policy to routes to be imported
from a peer or peer group.
export: applies a route-policy to routes to be advertised
to a peer or peer group.
peer { group-name | ipv4-address } default-routeadvertise [ route-policy route-policy-name ] [ conditionalroute-match-all{ ipv4-address1 { mask1 | mask-length1 } }
&<1-4> | conditional-route-match-any { ipv4address2 { mask2 | mask-length2 } } &<1-4> ]

ipv4-address: specifies an IPv4 address of a peer.


route-policy route-policy-name: specifies a routepolicy name.
conditional-route-match-all ipv4address1{ mask1 | mask-length1 }: specifies the IPv4
address and mask/mask length for conditional routes.
The default routes are sent to the peer or peer group
only when all conditional routes are matched.
conditional-route-match-any ipv4address2{ mask2 | mask-length2 }: specifies the IPv4
address and mask/mask length for conditional routes.
The default routes are sent to the peer or peer group
only when any conditional route is matched.
Configuration verification
Run the display ip routing-table command to view IP routing
table information.

Case description
This case is an extension to the previous case. Perform the
configuration based on the configuration in the previous case.

Command usage
The maximum load-balancing command configures the
maximum number of equal-cost routes.
View

BGP view
Parameters
maximum load-balancing [ ebgp | ibgp ] number
ebgp: implements load balancing among EBGP routes.
ibgp: implements load balancing among IBGP routes.
number: specifies the maximum number of equal-cost
routes in the BGP routing table.
Precautions
The maximum load-balancing number command cannot be
used together with the maximum load-balancing ebgp
number or maximum load-balancing ibgp number command.
If the maximum load-balancing ebgp number or maximum
load-balancing ibgp number command is executed, the
maximum load-balancing number command does not take
effect.

Configuration verification
Run the display ip routing-table protocol bgp command to
view the load-balanced routes learned by BGP.

Case description
This case is an extension to the previous case. Perform the
configuration based on the configuration in the previous case.
After GTSM is enabled between R6 and R8, the hop count
should be 1.

Command usage
The peer valid-ttl-hops command applies the GTSM function
on the peer or peer group.
The gtsm default-action command configures the default
action to be taken on the packets that do not match the GTSM
policy.
The gtsm log drop-packet command enables the log function
on a board to log information about the packets discarded by
GTSM on the board.
View
peer valid-ttl-hops: BGP view
gtsm default-action: system view
gtsm log drop-packet: system view
Parameters
peer ipv4-address valid-ttl-hops [ hops ]
ipv4-address: specifies the IPv4 address of a peer.
hops: specifies the number of TTL hops to be checked.
The value is an integer that ranges from 1 to 255. The default
value is 255. If the value is configured as hops, the valid TTL
range of the detected packet is [255 - hops + 1, 255].
gtsm default-action { drop | pass }

drop: discards the packets that do not match the GTSM


policy.
pass: allows the packets that do not match the GTSM
policy to pass through.

Precautions
GTSM and EBGP-MAX-HOP affect the TTL values of sent
BGP packets. The two functions are mutually exclusive.
If the default action is configured but the GTSM policy is not
configured, GTSM does not take effect.

Case description
In the topology, among the IP addresses that are not marked,
Rx and Ry connect using IP addresses XY.1.1.X/24 and
XY.1.1.Y/24.

Results
Run the displayvlan command to view the results.

Results
Run the display bgp peer command to view the BGP peer
relationship.

Results
Run the display bgp routing-table command to view the BGP
routing table. The command output shows that 2.2.2.2/32 and
3.3.3.3/32 have been advertised.

Results
The loop is the result of inconsistency between IGP route
selection and BGP route selection.

Case description
In the topology, among the IP addresses that are not marked,
Rx and Ry connect using IP addresses XY.1.1.X/24 and
XY.1.1.Y/24.

Analysis process
Run the display bgp routing-table community command to
view the attributes.

Results
You will notice that the Community attribute of route
10.0.0.0/24 is labeled as <400:1>, no-export on R2.

Results
You can add the AS_Path Attribute to change the route
selection of R3.

To ensure connectivity between IBGP peers, you need to establish fullmesh connections between IBGP peers. If there are n routers in an AS,
you need to establish n(n-1)/2 IBGP connections. When there are a
large number of IBGP peers, many network resources and CPU
resources are consumed. A route reflector (RR) can be used between
IBGP peers to solve this problem.
In an AS, a router functions as an RR, and other routers function as
clients. The RR and its clients establish IBGP connections and form a
cluster. The RR reflects routes to clients, removing the need to
establish BGP connections between clients.
RR concepts
RR: a BGP device that can reflect the routes learned from an
IBGP peer to other IBGP peers.
Client: an IBGP device of which routes are reflected by an RR
to other IBGP devices. In an AS, clients only need to directly
connect to the RR.
Non-client: an IBGP device that is neither an RR nor a client.
In an AS, a non-client must establish full-mesh connections
with the RR and all the other non-clients.
Originator: a device that originates routes in an AS. The
Originator_ID attribute helps eliminate routing loops in a
cluster.
Cluster: a set of an RR and clients. The Cluster_List attribute
helps eliminate routing loops between clusters.

An RR advertises learned routes to IBGP peers based on the following


rules:
The RR advertises the routes learned from an EBGP peer to
all the clients and non-clients.
The RR advertises the routes learned from a non-client IBGP
peer to all the clients.
The RR advertises the routes learned from a client to all the
other clients and all the non-clients.
An RR is easy to configure because it needs to be configured only on
the device that functions as a reflector, and clients do not need to know
that they are clients.
In some networks, if clients of an RR establish full-mesh connections
among themselves, they can directly exchange routing information. In
this case, route reflection between clients is unnecessary and wastes
bandwidth. You can run the undo reflect between-clients command
on the VRP Platform to prohibit an RR from reflecting the routes
received from a client to other clients.

The originator ID identifies the originator of a route and is generated by


an RR to prevent routing loops in a cluster.
When an RR reflects a route for the first time, the RR adds the
Originator_ID attribute to this route. The Originator_ID attribute
identifies the originator of the route. If the route already
contains the Originator_ID attribute, the RR retains this
Originator_ID attribute.
When a device receives a route, the device compares the
originator ID of the route with the local router ID. If they are the
same, the device discards the route.
An RR and its clients form a cluster, which is identified by a unique
cluster ID in an AS.
To prevent routing loops between clusters, an RR uses the Cluster_List
attribute to record the cluster IDs of all the clusters that a route
passes through.
When an RR reflects a route between clients, or between
clients and non-clients, the RR adds the local cluster ID to the
top of the cluster list. If there is no cluster list, the RR creates a
Cluster_List attribute.
When receiving an updated route, the RR checks the cluster
list of the route. If the cluster list contains the local cluster ID,
the RR discards the route. If the cluster list does not contain
the local cluster ID, the RR adds the local cluster ID to the
cluster list and then reflects the route.

Backup RR prevents single-point failures.


Backup RR
On the VRP, you need to run the reflector cluster-id
command to set the same cluster ID for all the RRs in the
same cluster.
When redundant RRs exist, a client receives multiple routes to
the same destination from different RRs and then selects the
optimal route according to BGP route selection policies.
The Cluster_List attribute prevents routing loops between
different RRs in the same AS.
Topology description
When Client1 receives an updated route 10.0.0.0/24 from an
external peer, it advertises the route to RR1 and RR2 through
IBGP.
After RR1 receives the updated route, it reflects the route to
other clients (Client2 and Client3) and adds the local cluster ID
to the top of the cluster list.
After RR2 receives the updated route, it checks the cluster list
and finds that its cluster ID has been contained in the cluster
list. Subsequently, it discards the route without reflecting the
route to its clients.

A backbone network is divided into multiple clusters. RRs of the


clusters are non-clients and establish full-mesh connections with one
other. Although each client only establishes an IBGP connection with
its RR, all the BGP routers in the AS can receive reflected routing
information.

A level-1 RR (RR1) is deployed in Cluster1, while RRs (RR2 and RR3)


in Cluster2 and Cluster3 function as clients of RR1.

Confederation
A confederation divides an AS into sub-ASs. Full-mesh IBGP
connections are established in each sub-AS, while EBGP
connections are established between sub-ASs. ASs outside a
confederation still consider the confederation as an AS.
After a confederation divides an AS into sub-ASs, it assigns a
confederation ID (the AS number) to each router within the AS.
The original IBGP attributes are retained, including the
Local_Pref attribute, MED attribute, and Next_Hop attribute.
Confederation-related attributes are automatically deleted
when being advertised outside a confederation. The
administrator therefore does not need to configure the rules for
filtering information such as sub-AS numbers at the egress of
a confederation.

The AS_Path attribute is a well-known mandatory attribute. It consists


of ASs and has the following types:
AS_SET: comprises a series of ASs in a disorderly manner
and is carried in an Update message. When network
summarization occurs, you can use policies to prevent path
information loss using AS_SET.
AS_SEQUENCE: comprises a series of ASs in sequence and
is carried in an Update message. Generally, the AS_Path type
is AS_SEQUENCE.
AS_CONFED_SEQUENCE: comprises a series of member
ASs in a confederation in sequence and is carried in an
Update message. Similar to AS_SEQUENCE,
AS_CONFED_SEQUENCE can only be transmitted within a
local confederation.
AS_CONFED_SET: comprises a series of member ASs in a
confederation in a disorderly manner and is carried in an
Update message. Similar to AS_SET, AS_CONFED_SET can
only be transmitted within a local confederation.
Member AS numbers within a confederation are invisible to other ASs
outside the confederation. When routes are therefore advertised to
other ASs outside the confederation, member AS numbers are
removed.

Comparison between a route reflector and a confederation


A confederation requires an AS to be divided into sub-ASs,
changing the network topology a lot.
Only an RR needs to be configured, and clients do not need to
be configured. The confederation needs to be configured on all
the devices.
RRs must establish full-mesh IBGP connections.
Route reflectors are widely used, while confederations are
seldom used.

The BGP routing table of each device on a large network is large. This
burdens devices, increases the route flapping probability, and affects
network stability.
Route summarization is a mechanism that combines multiple routes
into one route. This mechanism allows a BGP device to advertise only
the summarized route but not all the specific routes to peers. It reduces
the BGP routing table size. If the specific routes flap, the network is not
affected, therefore improving network stability.
Route summarization uses the Aggregator attribute. This attribute is an
optional transitive attribute and identifies the node where route
summarization occurs and carries the router ID and AS number of the
node.

Precautions
The summary automatic command summarizes the routes
imported by BGP, including direct routes, static routes, RIP
routes, OSPF routes, and IS-IS routes. After summarization is
configured, BGP summarizes routes according to the natural
network segment and suppresses specific routes in the BGP
routing table. This command is only valid for the routes
imported using the network command.
BGP advertises only summarized routes to peers.
BGP does not start automatic summarization by default.

Manual summarization
Summarized routes do not carry the AS_Path attribute of detail
routes.
Using the AS_SET attribute to carry the AS number can
prevent routing loops. Differences between AS_SET and
AS_SEQUENCE are as follows: In AS_SET, the AS list is
often used to perform route summarization, and AS numbers
are added to the AS list in a disorderly manner. In
AS_SEQUENCE, AS numbers are added to the AS list in the
sequence in which a route passes through.
Adding the AS_SET attribute to summarized routes may cause
routing flapping.

RFC 5291 and RFC 5292 define the prefix-based BGP outbound route
filtering (ORF) capability to advertise required BGP routes. BGP ORF
allows a device to send prefix-based inbound policies in a RouteRefresh message to BGP peers. BGP peers then construct outbound
policies based on these inbound policies to filter routes before sending
these routes. This capability has the following advantages:
Prevents the local device from receiving a large number of
unnecessary routes.
Reduces CPU usage of the local device.
Simplifies the configuration of BGP peers.
Improves link bandwidth efficiency.
Case description
Among directly-connected EBGP peers, after negotiating the
prefix-based ORF capability with R1, Client2 adds local prefixbased inbound policies to a Route-Refresh message and
sends the message to R1. R1 then constructs outbound
policies based on the received Route-Refresh message and
sends required routes to Client1 using a Route-Refresh
message. Client1 receives only the required routes, and R1
does not need to maintain routing policies. In this manner, the
configuration workload is reduced.
Client1 and Client2 are clients of the RR. Client1, Client2, and
the RR negotiate the prefix-based ORF capability. Client1 and
Client2 then add local prefix-based inbound policies to RouteRefresh messages and send the messages to the RR.

The RR constructs outbound policies based on the received


inbound policies and reflects required routes in Route-Refresh
messages to Client1 and Client2. Client1 and Client2 receive only
the required routes, and the RR does not need to maintain routing
policies. The configuration workload is thereby reduced.

Active-Route-Advertise
Once a route is preferred by BGP, the route can be advertised
to peers by default. When Active-Route-Advertise is configured,
only the route preferred by BGP and also active at the routing
management layer is advertised to peers.
Active-Route-Advertise and the bgp-rib-only command are
mutually exclusive. The bgp-rib-only command prohibits BGP
routes from being advertised to the IP routing table.

BGP dynamic update peer-groups


BGP sends routes based on peers by default, even though the
peers have the same outbound policies.
After this feature is enabled, BGP groups each route only once
and then sends the route to all the peers in the update-group,
improving grouping efficiency exponentially.
Topology description
RR1 has three clients and needs to reflect 100,000 routes to these
clients. If RR1 sends the routes grouped per peer to the three clients,
the total number of times that all routes are grouped is 300,000
(100,000 x 3). After the dynamic update peer-groups feature is used,
the total number of grouping times changes to 100,000 (100,000 x 1),
improving grouping performance by a factor of 3.

Roles defined in 4-byte AS number


New speaker: a peer that supports 4-byte AS numbers
Old speaker: a peer that does not support 4-byte AS numbers
New session: a BGP connection between new speakers
Old session: a BGP connection between a new speaker and
an old speaker, or between old speakers.
Protocol extension
Two new optional transitive attributes, AS4_Path with attribute
code 0x11 and AS4_Aggregator with attribute code 0x12, are
defined to transmit 4-byte AS numbers in old sessions.
If a BGP connection is set up between a new speaker and an
old speaker, a newly reserved AS_TRANS with value 23456 is
defined for interoperability between 4-byte AS number and 2byte AS number.
New AS numbers have three formats:
asplain: represents an AS number using a decimal
integer.
asdot+: represents an AS number using two integer
values joined by a period character: <high order 16-bit
value in decimal>.<low order 16-bit value in decimal>.
For example, 2-byte ASN123 is represented as 0.123,
and ASN 65536 is represented as 1.0. The largest
value is 65535.65535.

asdot: represents a 2-byte AS number using the


asplain format and representing a 4-byte AS number
using the asdot+ format. (1 to 65535; 1.0 to
65535.65535)
Huawei supports the asdot format.
Topology description
R2 receives a route with a 4-byte AS number 10.1 from R1.
R2 establishes a peer relationship with R3 and needs to
enable R3 to consider the AS number of R2 as AS_TRANS.
When advertising a route to R3, R2 records AS_TRANS in the
AS_Path attribute of the route and records 10.1 and its AS
number 20.1 to the AS4_Path attribute in the sequence
required by BGP.
R3 retains the unrecognized AS4_Path attribute and
advertises the route to R4 according to BGP rules and
considers the AS number of R2 as AS_TRANS.
When receiving the route from R3, R4 replaces AS_TRANS
with the IP address recorded in the AS4_Path attribute and
records the AS4_Path as 30 20.1 10.1.

Next-hop iteration based on routing policy


BGP needs to iterate indirect next hops. If indirect next hops
are not iterated according to the routing policy, routes may be
iterated to incorrect forwarding paths. Next hops should
therefore be iterated according to certain conditions to control
the iterated routes. If a route cannot pass the routing policy,
the route is ignored and route iteration fails.
Topology description
IBGP peer relationships are established between R1 and R2,
and between R1 and R3 through loopback interfaces. R1
receives a BGP route with prefix 10.0.0.0/24 from R2 and R3.
The original next hop of the BGP route received from R2 is
2.2.2.2. The IP address of Ethernet0/0/0 of R1 is 2.2.2.100/24.
When R2 is running normally, the BGP route with prefix
10.0.0.0/24 is iterated to the IGP route 2.2.2.2/32. When the
IGP on R2 becomes faulty, the IGP route 2.2.2.2/32 is
withdrawn. This causes route iteration again. On R1, a route is
searched for in the IP routing table based on the original next
hop 2.2.2.2. Consequently, the route is iterated to 2.2.2.0/24.
The user expects that: when the route with the next hop
2.2.2.2 becomes unreachable, the route with the next hop
3.3.3.3 is preferred. Actually, the fault is caused by BGP
convergence and results in an instant routing black hole.

With the next-hop iteration policy, you can control the mask
length of the route through which the original next hop can be
iterated. After the next-hop iteration policy is configured, the
route with the original next hop 2.2.2.2 depends on only the
IGP route 2.2.2.2/32.

Session setup between peers


A session can be set up between BGP speakers through
directly connected or loopback interfaces. Generally, IBGP
neighbors establish peer relationships through loopback
interfaces, while EBGP neighbors establish peer relationships
through directly connected physical interfaces.
You can configure authentication to ensure security for
sessions between peers.
Logical full-mesh connections must be set up between IBGP
peers (no RR or confederation is used).
You can prohibit synchronization to reduce the IGP load.
Route update origin
Routes can be imported into BGP using the import-route or
network command.
Routing policy optimization
You can optimize BGP routes using inbound policies,
outbound policies, and ORF.
Route filtering and attribute control
You can filter the routes to be advertised or received.
You can control BGP route attributes to affect BGP route
propagation.
Route summarization
Route summarization can optimize BGP routing entries and
reduce the routing table size.

Redundancy
Path redundancy ensures that a backup path is available when
a network fault occurs.
Traffic symmetry
Scientific network design and policy application can ensure
consistent paths for incoming and outgoing traffic.
Load balancing
When multiple paths to the same destination exist, traffic can
be load balanced through policies to fully utilize bandwidth.

Interaction between non-BGP routes and BGP routes


Generally, non-BGP routes can be imported into the BGP
routing table using the import-route or network command.
Control of default routes
Default routes can be advertised or received according to
conditions of routing policies.
Policy-based routing
Traffic paths can be optimized through PBR.

Dynamic update peer-groups: greatly improves router performance.


Route reflector and confederation: reduces the number of IBGP
sessions and optimizes large BGP networks.

Reduce unstable routes


Use stable IGPs.
Improve router performance.
Reduce manual errors.
Expand link bandwidth.
Improve BGP stability
Use BGP soft reset when using new BGP policies.
Punish unstable routes correctly to reduce the impact of these
routes on BGP.

Case description
IP addresses used to interconnect devices are as follows:
If RTX connects to RTY, interconnected addresses are
XY.1.1.X and XY.1.1.Y. Network mask is 24.
If OSPF runs normally and the interconnected addresses and
loopback interface addresses have been advertised into OSPF.
However 10.0.X.0/24, 172.15.X.0/24, and 172.16.X.0/24 are
not advertised into OSPF.
Case analysis
EBGP peer relationships are established using loopback
interfaces.

Command usage
The peer as-number command sets an AS number for a
specified peer or peer group.
The peer connect-interface command specifies a source
interface that sends BGP messages and a source address
used to initiate a connection.
The peer next-hop-local command configures a BGP device
to set its IP address as the next hop of routes when it
advertises routes to an IBGP peer or peer group.
The group command creates a peer group.
View
BGP process view
Parameters
peer ipv4-address as-number as-number
ip-address: specifies the IPv4 address of a peer.
as-number: specifies the AS number of the peer.
peer ipv4-address connect-interface interface-type interfacenumber [ ipv4-source-address ]
ip-address: specifies the IPv4 address of a peer.
interface-type interface-number: specifies the interface
type and number.
ipv4-source-address: specifies the IPv4 source address
used to set up a connection.

peer ipv4-address next-hop-local


ip-address: specifies the IPv4 address of a peer.
group group-name [ external | internal ]
group-name: specifies the name of a peer group.
external: creates an EBGP peer group.
internal: creates an IBGP peer group.
Precautions
When configuring a device to use a loopback interface as the
source interface of BGP messages, note the following points:
The loopback interface of the device's BGP peer must
be reachable.
In the case of an EBGP connection, the peer ebgpmax-hop command must be executed to enable the
two devices to establish an indirect peer relationship.
The peer next-hop-local and peer next-hop-invariable
commands are mutually exclusive.
The Rec field in the display bgp peer command output
indicates the number of route prefixes received from the peer.

Case description
The topology in this case is the same as that in the previous
case. Perform the configurations based on the configuration in
the previous case.
If all the clients of the RR have established logically full-mesh
connections, the clients can transmit routes to each other
without requiring the RR to reflect routes to them. In this
situation, prohibit the RR from reflecting routes to clients so as
to reduce the RR load.

Command usage
The undo reflect between-clients command prohibits an RR
from reflecting routes to clients. This command is executed on
an RR. After this command is executed, clients can directly
exchange BGP messages, while R2 does not need to reflect
routes to these clients. However, R2 still reflects the routes
that are advertised by non-clients.
View
BGP view
Configuration verification
Run the display bgp peer command to view detailed BGP
peer information.
To reduce the RR load, prohibit BGP routes from being added
to the IP routing table and prevent the RR from forwarding
packets. Disabling route reflection between clients however
can better meet the full-mesh scenario requirement.

Case description
The topology in this case is the same as that in the previous
case. To meet the first requirement, use a route-policy to
advertise interface routing information.
To meet the second requirement, use an IP prefix list to filter
routes.

Command usage
The peer ip-prefix command configures a route filtering policy
based on an IP prefix list for a peer or peer group.
View

BGP view
Parameters
peer { group-name | ipv4-address } ip-prefix ip-prefixname { import | export }
group-name: specifies the name of a peer group.
ipv4-address: specifies the IPv4 address of a peer.
ip-prefix-name: specifies the name of an IP prefix list.
import: applies a filtering policy to the routes received
from a peer or peer group.
export: applies a filtering policy to the routes sent to a
peer or peer group.
Configuration verification
Run the display bgp routing-table command to view the BGP
routing table.
For the same node in a route-policy, the relationship between
if-match clauses is AND. A route needs to meet all the
matching rules before the actions defined by apply clauses
are performed.

The relationship between the if-match clauses in the if-match routetype and if-match interface commands is "OR", but the relationship
between the if-match clauses in the two commands and other
commands is "AND".

Case description
This case is an extension to the previous case. Perform the
configuration based on the configuration in the previous case.
In requirement 2, the delivery of a default route depends on
route 172.16.0.0/16. If route 172.16.0.0/16 disappears, the
default route also disappears.

Command usage
The peer route-policy command specifies a route-policy to
control routes received from or to be advertised to a peer or
peer group.
The peer default-route-advertise command configures a
BGP device to advertise a default route to its peer or peer
group.
View
peer route-policy: BGP view
peer default-route-advertise: BGP view
Parameters
peer ipv4-address route-policy route-policyname { import | export }
ipv4-address: specifies an IPv4 address of a peer.
route-policy-name: specifies a route-policy name.
import: applies a route-policy to routes to be imported
from a peer or peer group.
export: applies a route-policy to routes to be advertised
to a peer or peer group.
peer { group-name | ipv4-address } default-routeadvertise [ route-policy route-policy-name ] [ conditionalroute-match-all{ ipv4-address1 { mask1 | mask-length1 } }
&<1-4> | conditional-route-match-any { ipv4address2 { mask2 | mask-length2 } } &<1-4> ]

ipv4-address: specifies an IPv4 address of a peer.


route-policy route-policy-name: specifies a routepolicy name.
conditional-route-match-all ipv4address1{ mask1 | mask-length1 }: specifies the IPv4
address and mask/mask length for conditional routes.
The default routes are sent to the peer or peer group
only when all conditional routes are matched.
conditional-route-match-any ipv4address2{ mask2 | mask-length2 }: specifies the IPv4
address and mask/mask length for conditional routes.
The default routes are sent to the peer or peer group
only when any conditional route is matched.
Configuration verification
Run the display ip routing-table command to view
information about the IP routing table.

Case description
This case is an extension to the previous case. Perform the
configuration based on the configuration in the previous case.

Command usage
The aggregate command creates an aggregated route in the
BGP routing table.
View

BGP view
Parameters
aggregate ipv4-address { mask | mask-length } [ asset | attribute-policy route-policy-name1 | detailsuppressed | origin-policy route-policy-name2 | suppresspolicyroute-policy-name3 ] *
ipv4-address: specifies the IPv4 address of an
aggregated route.
mask: specifies the network mask of an aggregated
route.
mask-length: specifies the network mask length of an
aggregated route.
as-set: generates a route with the AS-SET attribute.
attribute-policy route-policy-name1: specifies the name
of an attribute policy for aggregated routes.
detail-suppressed: advertises only the aggregated
route.
origin-policy route-policy-name2: specifies the name of
a policy that allows route aggregation.

suppress-policy route-policy-name3: specifies the


name of a policy for suppressing the advertisement of
specified routes.
Precautions
During manual or automatic summarization, routes pointing to
NULL0 are generated locally.
Configuration verification
Run the display ip routing-table protocol bgp command to
view the routes learned by BGP.

Case description
This case is an extension to the previous case. Perform the
configuration based on the configuration in the previous case.
BGP on-demand route advertisement requires ORF to be
enabled on R4, R5, and R6.

Command usage
The peer capability-advertise orf command enables prefixbased ORF for a peer or peer group.
View

BGP view
Parameters
peer { group-name | ipv4-address } capability-advertise
orf [ cisco-compatible ] ip-prefix { both | receive | send }
group-name: specifies the name of a peer group.
ipv4-address: specifies the IPv4 address of a peer.
cisco-compatible: is compatible with Cisco devices.
both: allows the device to send and receive ORF
packets.
receive: allows the device to receive only ORF packets.
send: allows the device to send only ORF packets.
Precautions
BGP ORF has three modes: send, receive, and both. In send
mode, a BGP device can send ORF information. In receive
mode, a BGP device can receive ORF information. In both
mode, a BGP device can send and receive ORF information.

To enable a BGP device that advertises routes to receive ORF


IP-prefix information, configure this device to work in receive or
both mode and the peer device to work in send or both mode.
Configuration verification
Run the display bgp peer 1.1.1.1 orf ip-prefix command to
view prefix-based BGP ORF information received from a
specified peer.

Case description
IP addresses used to interconnect devices are as follows:
If RTX connects to RTY, interconnected addresses are
XY.1.1.X and XY.1.1.Y.Network mask is 24.

Results
The configuration is the basic OSPF configuration.

Results
Run the display bgp peer command to view the BGP peer
status.
Run the display bfd session all command to view the BFD
session. In the command output, D_IP_IF indicates that a BFD
session is dynamically created and bound to an interface.

Results
Run the display bgp routing-table command to view BGP
routing entries. The command output shows that R3 learns two
routes 10.0.0.0/24 from R2 and R4. According to BGP routing
rules, R3 prefers the route 10.0.0.0/24 learned from R2.

Case description
This case is an extension to the previous case. Perform the
configuration based on the configuration in the previous case.

Analysis process
You can use commands peer groups to reduce the RR load.

Results
Run the display bgp routing-table community command to
view the Community attribute.

Results
Run the display bgp routing-table community command to
view the Community attribute. The Community attribute is noexport. That is, the route is not advertised to EBGP peers.

ACL

An ACL is a series of sequential rules composed of permit and


deny clauses. These rules match packet information to classify
packets. Based on ACL rules, Routers permits or denies
packets.
An Access Control List (ACL) is a set of sequential rules. The
ACL filters packets according to the specified rules. With the
rules applied to a device, the device permits or denies the
packets according to the rules.
IP prefix list
An IP prefix list filters matching routes in defined matching
mode to meet requirements.
An IP prefix list filters only routing information but not packets.
AS_Path filter
Each BGP route contains an AS path attribute. AS path
filters specify matching rules regarding AS path attribute.
AS path filters are exclusively used in BGP.

Community filter
Community filters are exclusively used in BGP. Each BGP
route contains a community domain to identify a
community.Community filters specify matching rules
regarding community domains.

ACL management rule


An ACL can contain multiple rules.
A rule is identified by a rule ID, which can be set by a user or
automatically generated based on the ACL step. All the rules
in an ACL are arranged in ascending order of rule IDs.
There is a step between rule IDs. If no rule ID is specified, the
step is determined by the ACL step. You can add new rules to
a rule group based on the rule ID.
ACL rule management
When a packet reaches a device, the search engine retrieves
information from the packet to constitute the key value and
matches the key value with rules in an ACL. When a matching
rule is found, the system stops the matching, and the packet
matches the rule.
If no matching rule is found, the packet does not match any
rule.
The action defined in the last rule of a Huawei ACL is permit by default.

Interface-based ACL
Match packets based on the rules defined on the inbound
interface of packets. You can run the traffic-filter command to
reference an interface-based ACL.

Basic ACL
Define rules based on the source IP address, VPN instance,
fragment flag, and time range of packets.
Advanced ACL
Define rules based on the source IP address, destination IP
address, IP preference, ToS, DSCP, IP protocol type, ICMP
type, TCP source port/destination port, and UDP source
port/destination port number of packets. An advanced ACL can
define more accurate, abundant, and flexible rules than a basic
ACL.
Layer 2 ACL
Define rules based on Ethernet frame header information in a
packet, including the source MAC address, destination MAC
address, and Ethernet frame protocol type.

ACL matching order


An ACL is composed of a list of rules. Each rule contains a
deny or permit clause. These rules may overlap or conflict.
One rule can contain another rule, but the two rules must be
different.
Devices support two types of matching order: configuration
order and automatic order. The matching order determines the
priorities of the rules in an ACL. Rule priorities resolve the
conflict between overlapping rules.
Automatic order
The automatic order follows the depth-first principle.
ACL rules are arranged in sequence based on rule precision.
For an ACL rule (where a protocol type, a source IP address
range, or a destination IP address range is specified), the
stricter the rule, the more precise it is considered. For example,
an ACL rule can be configured based on the wildcard of an IP
address. The smaller the wildcard, the smaller the specified
host and the stricter the ACL rule.
If rules have the same depth-first order, rules are matched in
ascending order of rule IDs.

Packet fragmentation supported by ACLs


In traditional packet filtering, only the first fragment of a packet
needs to match rules, while the other fragments are allowed to
pass through if the first fragment matches rules. In this
situation, network attackers may construct subsequent
fragments to launch attacks.
In an ACL rule, the fragment parameter indicates that the rule
is valid for all fragmented packets. The none-first-fragment
parameter indicates that the rule is valid only for non-first
fragmented packets but not for non-fragmented packets or the
first fragmented packet. The rules that do not contain
fragment and none-first-fragment parameters are valid for all
packets (including fragmented packets).
ACL time range
You can make ACL rules valid only at the specified time or
within a specified time range.

IP prefix list
An IP prefix list can contain multiple indexes. Each index has a
node. The system matches a route against nodes by the index
in ascending order. If the route matches a node, the system
does not match the route against the other nodes. If the route
does not match any node, the system filters the route.
According the matching prefix, an IP prefix list can be used for
accurate matching, or matching within a specified mask length
range.
An IP prefix list can implement accurate matching, or matching
within a specified mask length range. You can configure
greater-equal and less-equal to specify the prefix mask
length range. If the two keywords are not configured, an IP
prefix is used to implement accurate matching. That is, only
routes with the same mask length as that specified in the IP
prefix list are matched. If only greater-equal is configured, the
mask length range is [greater-equal-value,32]. If only lessequal is configured, the mask length range is [specified mask
length, less-equal-value].
The mask length range can be specified as masklength<=greater-equal-value<=less-equal-value<=32.
Characteristics of an IP prefix list
When all IP prefix lists are not matched, the last matching
mode is deny by default.
When the referenced IP prefix list does not exist, the default
matching mode is permit.

An AS_Path filter is only used to filter BGP routes to be advertised or


received based on the AS_Path attributes contained in the BGP
routes.
Since the number of the last AS that a route passes through is added to
the leftmost of an AS_Path list, configure an AS_Path filter with
caution:
If a route originating from an AS passes through AS 300, AS
200, and AS500, and then reaches AS 600, the AS_Path
attribute of the route is (500 200 300 100).

A Community filter is only used to filter BGP routes to be advertised or


received based on the Community attributes contained in the BGP
routes.
The Community attribute includes basic and advanced community
attributes.
Self-defined community attributes and well-known
communities are basic community attributes.
RT and SOO in MPLS VPN are extended community attributes.

A route policy is used to filter routes and set attributes for routes. By
changing route attributes (including reachability), a route policy
changes the path that network traffic passes through.
A route policy is often used in the following scenarios:
Control route importing.
Using a route policy, you can preventing sub-optimal
routes and routing loops during the import of routes.
Control route receiving and advertising.
Using a route policy, you can receive or advertise
specified routes according to network requirements.
Set attributes for routes.
Using a route policy, you can modify the attributes of
routes to optimize a network.
Route policy principles
A route policy consists of multiple nodes. The system checks
routes against the nodes of a route policy in ascending order
of the node IDs. A node contains multiple if-match and apply
clauses. The if-match clauses define matching conditions of a
node, while apply clauses define the actions to be performed
on the routes that match if-match clauses. The relationship
between the if-match clauses of a node is AND. That is, a
route matches a node only when the route matches all the ifmatch clauses of the node. The relationship between the
nodes of a route policy is OR.

That is, a route matches a route policy as long as the route


matches the route policy. If a route does not match any node,
the route fails to match the route policy.
The relationship between the if-match clauses of a node in a
route policy is AND. The actions defined by apply clauses can
be performed on a route only when the route meets all the
matching conditions defined by the if-match clauses. The
relationship between the if-match clauses in the if-match
route-type and if-match interface commands is OR, but the
relationship between the if-match clauses in the two
commands and other commands is AND.

In the topology, dual-node bidirectional route advertisement is


implemented.
In the topology, R1 imports route 10.0.0.1/24 into OSPF. R3
imports OSPF routes into IS-IS, and R2 learns routes
10.0.0.0/24 through IS-IS. In this case, R2 learns two routes
10.0.0.0/24 through OSPF and IS-IS. R2 prefers the route
learned through IS-IS because this route has a higher priority
than the external route learned through OSPF. Therefore, R2
reaches 10.0.0.0/24 along the path R4R3R1. To optimize
the path, modify the OSPF ASE priority to be higher than the
IS-IS priority using a route policy. This modification prevents
R2 from using a sub-optimal route.
When the interface that connects R1 to network 10.0.0.0/24
goes Down, R2 imports route 10.0.0.0/24 into OSPF because
it has learned the route through IS-IS even though the external
LSA has been aged in the OSPF area. R1 and R3 then learn
the route 10.0.0.0/24. When R2 accesses network 10.0.0.0/24,
traffic passes through R4R3R1R2, causing a routing
loop. In this scenario, use a tag to prevent routing loops.

Control route receiving and advertising


Only necessary and valid routes are received, which limits the
routing table size and improves network security.
Topology description
R4 imports routes 10.0.X.0/24 into OSPF. According to service
requirements, R1 can only receive routes 10.0.0.0/24 and
10.0.1.0/24, while R2 can only receive routes 10.0.2.0/24 and
10.0.3.0/24. You can use a filter policy to meet this
requirement.

Generally, only routing information is filtered, but link state information


is not filtered.
In OSPF, incoming and outgoing Type 3, Type 5, and Type 7
LSAs can be filtered.
Link-state routing protocols, such as OSPF and IS-IS, can filter
only incoming routes but not LSAs that carry these routes.
That is, OSPF and IS-IS do not add the filtered routes to the
local routing tables, but LSAs of these routes are still
transmitted in the OSPF or IS-IS area.
The routes imported from other protocols can also be filtered.
For example, you can use the filter-policy export command to
filter the imported routes to be advertised from RIP. Only the
external routes that pass the filtering can be converted into
AS-external LSAs and advertised. In this situation, other
neighbors do not have specified routes imported from RIP.
This configuration can only be performed in the outbound
direction.

Topology description
You can modify the Local_Pref attribute contained in a route
using a route policy to change the path of traffic. R2 learn the
route 10.0.0.0/24 from EBGP and modify the Local Pref value
300, R3 learn the route 10.0.0.0/24 from EBGP and modify the
Local Pref value 200. R1,R2,R3 have routes of each other
from IBGP, ultimate AS100 prefers R2 to reach the 10.0.0.0/24.

PBR is a mechanism that selects routes based on user-defined policies.


It includes local PBR, interface PBR, and SPR. This course discusses
only local PBR.
IP unicast PBR has the following advantages:
Allows you to define policies for route selection according to
service requirements, which improves route selection flexibility
and controllability.
Sends different data flows through different links, which
improves link efficiency.
Uses low-cost links to transmit service data without affecting
service quality, which reduces the cost of enterprise data
services.

Matching process
If a device finds a matching local PBR node, the device
processes packets as follows:
Step 1 Checks whether the priority of packets has been set.
If so, the device applies the configured priority
to the packets and performs step 2.
If not, the device performs step 2.
Step 2 Checks whether an outbound interface has been
configured for the local PBR.
If so, the device sends packets from the
outbound interface.
If not, the device performs step 2.
Step 3 Checks whether next hops have been configured
for the local PBR. You can configure two next hops to
implement load balancing.
If so, the device sends packets to the next hops.
If not, the device searches the routing table for
a route based on the destination addresses of
packets. If no route is available, the devices
performs step 4.
Step 4 Checks whether the default outbound interface has
been configured for the local PBR.
If so, the device sends the packets from the
default outbound interface.
If not, the device performs step 5.

Step 5 Checks whether the default next hop has been


configured for the local PBR.
If so, the device sends the packets to the
default next hop.
If not, the device performs step 6.
Step 6 Discards the packets and generates
ICMP_UNREACH messages.
If the device does not find a matching local PBR node, it
searches the routing table for a route based on the destination
addresses of the packets and then sends the packets.

Case description
IP addresses used to interconnect devices are as follows:
If RTX connects to RTY, interconnected addresses are
XY.1.1.X and XY.1.1.Y. Network mask is 24.

Command usage
The route-policy command creates a route policy and
displays the route-policy view.
View

System view
Parameters
route-policy route-policy-name { permit | deny } node node
route-policy-name: specifies the name of a route policy.
permit: specifies the matching mode of the route policy
as permit. In permit mode, if a route matches all the ifmatch clauses of a node, the route matches the route
policy, and the actions defined by the apply clause of
the route are performed on the route, otherwise, the
route continues to match the next node.
deny: specifies the matching mode of the route policy as
deny. In deny mode, if a route matches all the if-match
clauses of a route, the route does not match the route
policy and cannot match the next node.
node node: specifies the index of a node in the route
policy.
Precautions
A route policy is used to filter routes and set attributes for the routes
that match the route policy. A route policy consists of multiple nodes.

One node contains multiple if-match and apply clauses.


The if-match clauses define matching conditions for this node, and the
apply clauses define the actions to be performed on the routes that
meet the matching conditions. The relationship between if-match
clauses is AND. That is, a route must match all the if-match clauses of
a node. The relationship between the nodes of a route policy is OR.
That is, if a route matches a node, the route matches the route policy. If
a route does not match any node, the route does not match the route
policy.

Case description
The topology in this case is the same as that in the previous
case. Perform the configuration based on the configuration in
the previous case.
In requirement 2, use the least number of commands to
implement the optimal configuration.

Command usage
The filter-policy export command filters imported routes to be
advertised according to the policy.
View

System view
Parameters
filter-policy { acl-number | acl-name acl-name | ip-prefix ipprefix-name } export [ protocol [ process-id ] ]
acl-number: specifies the number of a basic ACL.
acl-name acl-name: specifies the name of an ACL.
ip-prefix ip-prefix-name: specifies the name of an IP
prefix list.
protocol: specifies the protocol that advertises routing
information.
process-id: specifies a process ID when the protocol that
advertises routing information is RIP, IS-IS, or OSPF.
Precautions
After external routes are imported into OSPF using the importroute command, you can run the filter-policy export
command to filter the imported routes to be advertised.

This configuration allows only the external routes that meet the
matching conditions to be translated into Type 5 LSAs (ASexternal-LSAs) and advertised. In this case, routing loops are
prevented.
You can specify protocol or process-id to filter the routes of a
specified protocol or process. If no protocol or process-id is
specified, OSPF filters all of the imported routes.

Case description
The topology in this case is the same as that in the previous
case. After meeting the requirements, check whether suboptimal routes and routing loops exist.

Results
After routing protocols import routes from each other, R4
reaches 172.16.X.0/24 through a sub-optimal route (OSPF
route 172.16.X.0/24). This is because R4 first learns OSPF
route 172.16.X.0/24 and then learns RIP route 172.16.X.0/24.
In fact, the optimal route is OSPF route 172.16.X.0/24.
However, the preference of OSPF external routes is 150,and
the preference of RIP is 100,so R4 reaches 172.16.X.0/24
through a sub-optimal route.

Case description
This case is an extension to the previous case. Perform the
configuration based on the configuration in the previous case.
To meet requirement 1, ensure that R4 accesses
172.16.X.0/24 through RIP, to void reaches 172.16.X.0/24
through a sub-optimal route.
To meet requirement 2, use tags to control dual-node
bidirectional route importing so as to prevent routing loops.

Results
If we do not filter routes when bidirectional route importing,
routing loops occur when network environments change. In
order to avoid the loop should ensure that routing protocols
between imported only importing in the routing domain self
routing. Based on the configuration in the previous, the
advantage of using TAG is not required to specify the routing
entries specifically. When routing domain specific item or
routing, the routing entries and restrictions will change, does
not need manual intervention, and has a good scalability.
Though the configuration in the previous could avoid routing
loops, but the sub-optimal route is still exist.

The reason of sub-optimal route is when dual-node bidirectional route


importing one of R3 and R4 will learn network 172.16.X.0/24 from both
OSPF and RIP, and the preference of OSPF external routes is greater
than RIP, R3 or R4(one of them ) reaches 172.16.X.0/24 through a suboptimal. To slove this you need to modify the preference of OSPF
external routes is smaller than RIP. The preference value of OSPF
external routes smaller than the OSPF internal routes is unreasonable.

Case description
This case is an extension to the previous case. Perform the
configuration based on the configuration in the previous case.

Results
When only route summarization is performed, two problems
exist: R5 learns the summary route, and a routing loop occurs
between R3 and R4 when R2 pings a nonexistent IP address.
The reason why the first problem occurs is as follows: After R3
and R4 learn the summary routes generated by themselves,
they import the summary routes into the RIP area again.
The reason why the second problem occurs is as follows: After
R3 and R4 learn the summary routes generated by themselves,
they add the summary routes to their routing tables.
To address the two problems, prevent R3 and R4 from
learning the summary routes generated by them and from
importing the routes into the OSPF area. That is, filter the
summary route learned from each other on R3 and R4.

Configuration filter policy on R3 and R4, avoid receive specify


summary routes of OSPF to ensure not importing this to the
domain of RIP for avoiding routing loops.

Case description
This case is an extension to the previous case. Perform the
configuration based on the configuration in the previous case.

Command usage
The policy-based-route command creates or modifies a PBR.
The ip local policy-based-route command enables local PBR.
View

policy-based-route: system view


ip local policy-based-route: system view
Parameters
policy-based-route policy-name { permit | deny } node nodeid
policy-name: specifies the PBR name.
permit: performs PBR on the routes that meet matching
conditions.
deny: does not perform PBR on the routes that meet
matching conditions.
node-id: specifies the ID of a node.
ip local policy-based-route policy-name
policy-name: specifies a PBR name.

Precautions
When deploying PBR, do not configure a broadcast interface
such as an Ethernet interface as the outbound interface of
packets.

Configuration verification
Run the display bgp peer 1.1.1.1 orf ip-prefix command to
view prefix-based BGP ORF information received from a
specified peer.

Case description
IP addresses used to interconnect devices are designed as
follows:
If RTX connects to RTY, interconnected addresses are
XY.1.1.X and XY.1.1.Y. Network mask is 24.

Results
When R5 imports routes, accurate matching must be
performed.

Results
When you tracert a nonexistent IP address that belongs to
10.0.0.0/16, a routing loop occurs. This is because no route
pointing to Null0 is generated when OSPF generates a
summary route.

Results
You can configure static routes pointing to Null0 on R5 using a
command to prevent routing loops.

Case description
This case is an extension to the previous case. Perform the
configuration based on the configuration in the previous case.
IP addresses used to interconnect devices are designed as
follows:
If RTX connects to RTY, interconnected addresses are
XY.1.1.X and XY.1.1.Y. Network mask is 24.
The IP address of R1 S0/0/0 is 12.1.1.1/24, and the IP
address of R2 S0/0/0 is 12.1.1.2/24. The IP address of
R1 S0/0/1 is 21.1.1.2/24, and the IP address of R2
S0/0/1 is 21.1.1.1/24.

Results
Use the ACL and route-policy commands to import two
network segment into IS-IS, usually use the filter-policy XXX
export command to import routes.

Results
After you use tags to prevent routing loops, If IS-IS support
Tags is necessary , the cost type must wide, otherwise the
routes of IS-IS can not be tagged.
To prevent the sub-optimal route, modify the preference of
OSPF external route 10.0.0.0/16 to be smaller than that of ISIS routes.

Results
Configuration on this case avoid sub-optimal routes of R3 and
R4. The difference of importing time cause one of R3 and R4
will learn 10.0.0.0/16 from ISIS or OSPF at the same time, If
R3 imported routes earlier, R4 will learn 10.0.0.0/16 from ISIS
and OSPF at the same time, and compare their preference,
the preference of OSPF external routes is 150, preference of
ISIS is 15, so R4 prefer ISIS to reach the network 10.0.0.0/16,
but this one is sub-optimal route. So mofidy the preference of
10.0.0.0/16 on R4 smaller than the preference value of ISIS
can eliminate sub-optimal routes. The preference value of
OSPF external routes smaller than the OSPF internal routes is
unreasonable.

Results
Use local PBR to meet this requirement.

VLAN technology brings the following benefits:


Limits broadcast domains. A broadcast domain is limited in a
VLAN. This saves bandwidth and improves network
processing capabilities.
Enhances network security. Packets from different VLANs are
separately transmitted. Hosts in a VLAN cannot directly
communicate with hosts in another VLAN.
Improves network robustness. A fault in a VLAN does not
affect hosts in other VLANs.
Flexibly sets up virtual groups. With VLAN technology, hosts in
different geographical areas can be grouped together. This
facilitates network construction and maintenance.
Topology description
S1 and S2 are located in different positions. Each switch
connects to two computers and the computers belongs to two
different VLANs. The dashed box indicates a VLAN.
By default, PCs in VLAN 2 cannot communicate with PCs in
VLAN 3. That is, broadcast packets are limited in a VLAN.

IEEE 802.1Q
IEEE 802.1Q is an Ethernet networking standard for a
specified Ethernet frame format. It adds the 4-byte 802.1Q Tag
field between the Source address and the Length/Type fields
of the original frame.
Subfields in the 802.1q Tag field:
TPID: is short for Tag Protocol Identifier and indicates the
frame type, which has 2 bytes. The value 0x8100 indicates an
802.1Q-tagged frame. An 802.1Q-incapable device discards
the received 802.1Q frame.
PRI: is short for priority and indicates the frame priority, which
has 3 bits. The value ranges from 0 to 7. The greater the value,
the higher the priority. When QoS is deployed on a switch, the
switch first sends data frames with higher priority.
CFI: is short for Canonical Format Indicator and indicates
whether the MAC address is in canonical format. The value 0
indicates the MAC address in canonical format and the value 1
indicates the MAC address in non-canonical format. CFI is
used to differentiate Ethernet frames, Fiber Distributed Digital
Interface (FDDI) frames, and token ring network frames. The
value is 0 on the Ethernet.
VID: is short for VLAN ID and indicates the VLAN to which a
frame belongs, which has 12 bits.

Each frame sent by an 802.1Q-capable switch can carries a VLAN ID.


In a VLAN, Ethernet frames are classified into the following types:
Tagged frame: frame with the 4-byte 802.1Q tag
Untagged frame: frame without the 4-byte 802.1Q tag

The following link types are available:


Access link: Usually connects a host to a switch. Generally,
a host does not need to know which VLAN it belongs to,
and host hardware cannot distinguish frames with VLAN
tags. Hosts therefore send and receive only untagged
frames along access links.
Trunk link: Usually connects a switch to another switch or
a router. Data of different VLANs is transmitted along a
trunk link. The two ends of a trunk link must be able to
distinguish frames using VLAN tags, and so only tagged
frames are transmitted along trunk links.
Topology description
A host does not need to know the VLAN to which it
belongs. It sends only untagged frames.
After receiving an untagged frame from a host, a
switching device determines the VLAN to which the frame
belongs based on the configured VLAN assignment
method such as interface information. The switching
device then processes the frame accordingly.
If a frame needs to be forwarded to another switching
device, the frame must be transparently transmitted
along a trunk link. Frames transmitted along trunk links
must carry VLAN tags to allow other switching devices to
properly forward the frame based on the VLAN
information.

After a switching device determines the outbound


interface of a frame and before the switching device
sends the frame to the destination host, the switching
device connected to the destination host removes the
VLAN tag from the frame to ensure that the host receives
an untagged frame.

Interface types
An access interface on a switch connects to an interface on a
host. It can only connect to access links.
The access interface allows only the VLAN whose ID is
the same as the Port Default VLAN ID (PVID).
If the access interface receives untagged frames from
the remote device, the switch adds the PVID to the
untagged frames.
Ethernet frames sent by the access interface are
always untagged frames.
A trunk interface on a switch connects to another switch. It can
only connect to trunk links.
The trunk interface allows frames from multiple VLANs
to pass through.
If the tag in the frame sent by the trunk interface is the
same as the PVID, the switch removes the tag from the
frame. The trunk interface sends untagged frames in
this situation only.
If the tag in the frame sent by the trunk interface is
different from the PVID, the switch directly sends the
frame.
A hybrid interface on a switch can connect to either a host or
another switch. It can connect to either access or trunk links.
The hybrid interface allows frames from multiple
VLANs to pass through and removes tags from frames
on the outbound interface.

Interface-based VLAN assignment


VLANs are assigned based on interface numbers.
The network administrator configures a PVID for each switch
interface, that is, an interface belongs to a VLAN by default.
When an untagged data frame reaches a switch
interface that has the PVID configured, the PVID is
added to the frame.
When a data frame carries a VLAN tag, the switch
does not add a VLAN tag to the data frame even if the
interface is configured with a PVID.
Different types of interfaces process VLAN frames in different
manners.
MAC address-based VLAN assignment
VLANs are assigned based on MAC addresses.
The network administrator needs to configure the mappings
between MAC addresses and VLAN IDs. When the switch
receives an untagged frame, it searches for the VLAN entry
matching the source MAC address of the frame and adds the
VLAN ID to the frame.
IP subnet-based VLAN assignment
When receiving an untagged frame, the switch adds a VLAN
tag to the packet based on the source IP address of the packet.

Protocol-based VLAN assignment


VLAN IDs are allocated to packets received on an interface
according to the protocol (suite) type and encapsulation format
of the packets. The network administrator needs to configure
the mappings between protocol types and VLAN IDs. When
the switch receives an untagged frame, it searches the
protocol-VLAN mapping table for a VLAN tag mapping the
protocol of the frame and adds it to the frame.
The protocol support vlan assignment contains
IPV4\IPV6\IPX\AppleTalk(AT), encapsulation type is Ethernet
II802.3 raw802.2 LLC802.2 SNAP.
Policy-based VLAN assignment
Terminals MAC addresses and IP addresses need to be
configured and associated with VLANs on the switch. Only
terminals matching conditions can be added to a specified
VLAN. After terminals matching conditions are added to the
VLAN, changes of the IP addresses or MAC addresses may
cause the terminals to be removed from the VLAN.

Topology description
To implement intra-communication in VLAN 2 and VLAN 3
through the trunk link between S1 and S2, add Port 2 on S1
and Port 1 on S2 to VLAN 2 and VLAN 3.
PC1 sends a frame to PC2 as follows:
The frame is first sent to Port 4 on S1.
Port 4 adds a tag to the frame. The VID field of the tag
is 2, that is, the ID of the VLAN to which Port 4 belongs.
S1 sends the frame to all interfaces in VLAN 2 except
Port 4 (Suppose the table of MAC address is empty).
Port 2 forwards the frame to S2.
After receiving the frame, S2 determines that the frame
belongs to VLAN 2 based on the tag. S2 sends the
frame to all interfaces in VLAN 2 except for Port 1.
Port 3 sends the frame to PC2.

Topology description
R1 is a Layer 3 switch supporting sub-interfaces, and S1 is a
Layer 2 switching device. LANs are connected using the
switched Ethernet interface on S1 and the routed Ethernet
interface on R1. To implement inter-VLAN communication,
perform the following operations:
Create two sub-interfaces on the Ethernet interfaces
connecting R1 and S1, and configure 802.1Q
encapsulation on sub-interfaces corresponding to
VLAN 2 and VLAN 3.
Configure IP addresses for sub-interfaces to ensure the
two sub-interfaces have reachable routes.
Configure Ethernet interfaces connecting S1 and R1 as
trunk or hybrid interfaces and configure them to allow
frames from VLAN 2 and VLAN 3 to pass through.
Configure the default gateway address as the IP
address of the sub-interface mapping the VLAN to
which the host belongs.
PC1 communicates with PC2 as follows:
PC1 checks the IP address of PC2 and determines that
PC2 is in another VLAN.
PC1 sends an ARP Request packet to R1 to request
R1's MAC address.

After receiving the ARP Request packet, R1 returns an


ARP Reply packet in which the source MAC address is
the MAC address of the sub-interface mapping VLAN 2.
PC1 obtains R1's MAC address.
PC1 sends a packet in which the destination MAC
address is the MAC address of the sub-interface and
the destination IP address is PC2's IP address to R1.
After receiving the packet, R1 forwards the packet and
detects that the route to PC2 is a direct route. The
packet is forwarded by the sub-interface mapping
VLAN 3.
R1 as the gateway in VLAN 3 broadcasts an ARP
Request packet requesting PC2's MAC address.
After receiving the ARP Request packet, PC2 returns
an ARP Reply packet.
After receiving the ARP Reply packet, R1 sends the
packet from PC1 to PC2. All packets sent from PC1 to
PC2 are sent to R1 first for Layer 3 forwarding.

A routing table must have correct routing entries so that new


data flows can be correctly forwarded. You can deploy VLANIF
interfaces and routing protocols on Layer 3 switches to
implement Layer 3 connectivity.

Topology description
VLAN 2 and VLAN 3 are assigned. To implement interVLAN communication, perform the following operations:
Create two VLANIF interfaces on S1 and configure
IP addresses for them to ensure the two VLANIF
interfaces have reachable routes.
Configure the default gateway address as the IP
address of the VLANIF interface mapping the
VLAN to which the user host belongs.
PC1 communicates with PC2 as follows:
PC1 checks the IP address of PC2 and determines
that PC2 is in another VLAN.
PC1 sends an ARP Request packet to S1 to request
S1's MAC address.
After receiving the ARP Request packet, S1 returns
an ARP Reply packet in which the source MAC
address is the MAC address of VLANIF 2.
PC1 obtains S1's MAC address.

PC1 sends a packet in which the destination MAC


address is the MAC address of the VLANIF
interface and the destination IP address is PC2's IP
address to S1.
After receiving the packet, S1 forwards the packet
and detects that the route to PC2 is a direct route.
The packet is forwarded by VLANIF 3.
S1 as the gateway in VLAN 3 broadcasts an ARP
Request packet requesting PC2's MAC address.
After receiving the ARP Request packet, PC2
returns an ARP Reply packet.
After receiving the ARP Reply packet, S1 sends the
packet from PC1 to PC2. All packets sent from PC1
to PC2 are sent to S1 first for Layer 3 forwarding.

VLAN aggregation, also known as the super-VLAN, partitions a


broadcast domain using multiple VLANs on a physical network so
that different VLANs can belong to the same subnet.
Super-VLAN: is a set of multiple sub-VLANs. In a super-VLAN,
only Layer 3 interfaces are created, and no physical interface
exists.
Sub-VLAN: is used to isolate broadcast domains. In the subVLAN, only physical interfaces exist and Layer 3 VLAN
interfaces cannot be created. The super-VLAN is used to
implement Layer 3 switching.
A super-VLAN can contain one or more sub-VLANs. IP
addresses of hosts in sub-VLANs of the super-VLAN belong to
the subnet of the super-VLAN.

Topology description
The super-VLAN (VLAN 10) contains the sub-VLANs (VLAN 2
and VLAN 3).
Proxy ARP between sub-VLANs is enabled on S1. The
communication process is as follows:
After comparing PC2s IP address (1.1.1.20) with its IP
address, PC1 finds that both IP addresses are on the
same network segment. The ARP table of PC1
however has no corresponding entry for PC2.
PC1 broadcasts an ARP Request packet to request
PC2s MAC address.
PC2 is not in VLAN 2, and so PC2 cannot receive the
ARP Request packet.
The gateway is enabled with proxy ARP between subVLANs, therefore after receiving the ARP Request
packet from PC1, the gateway finds that PC2s IP
address (1.1.1.20) is the IP address of a directly
connected interface. The gateway then broadcasts an
ARP Request packet to all the other sub-VLAN
interfaces to request for PC2s MAC address.
After receiving the ARP Request packet, PC2 sends an
ARP Reply packet.
After receiving the ARP Reply packet from PC2, the
gateway replies its MAC address to PC1.
The ARP tables of both S1 and PC1 have
corresponding entries of PC2.

To send packets to PC2, PC1 first sends packets to the


gateway, and then the gateway performs Layer 3
forwarding.

Topology description
The frame that enters S1 through Port 1 on PC1 is tagged with
the ID of VLAN 2. The VLAN ID, however, is not changed to
the ID of VLAN 10 on S1 even if VLAN 2 is the sub-VLAN of
VLAN 10. After passing through Port 3, which is a trunk
interface, this frame still carries the ID of VLAN 2. S1 discards
the frames of VLAN 10 that are sent to S1 by other devices
because S1 has no physical interface corresponding to VLAN
10.
A super-VLAN has no physical interface:
If you configure a super-VLAN and then a trunk interface, the
frames of a super-VLAN are filtered automatically according to
the VLAN range configured on the trunk interface.
If you first configure a trunk interface and configure the trunk
interface to allow all VLANs to pass through, you cannot
configure the super-VLAN on the device. The root cause is
that any VLAN with physical interfaces cannot be configured
as the super-VLAN. The trunk interface allows frames from all
VLANs to pass through, so no VLAN can be configured as a
super-VLAN.
On S1, only VLAN 2 and VLAN 3 are valid, and all frames are
forwarded in these VLANs.

Topology description
S2 is configured with super-VLAN 4, sub-VLAN 2, sub-VLAN 3,
and common VLAN 10. S1 is configured with two common
VLANs, namely, VLAN 10 and VLAN 20. S2 is configured with
the route to the network segment 1.1.3.0/24, and S1 is
configured with the route to the network segment 1.1.1.0/24.
PC1 in sub-VLAN 2 of super-VLAN 4 then needs to
communicate with PC3 on connected to S1.
After comparing PC3s IP address (1.1.3.2) with its IP
address, PC1 finds that two IP addresses are on
different network segments.
PC1 broadcasts an ARP Request packet to its gateway
(S2) to request S2s MAC address.
After receiving the ARP Request packet, S2 checks the
mapping between the sub-VLAN and the super-VLAN,
and sends an ARP Reply packet to PC1 through subVLAN 2. The source MAC address in the ARP Reply
packet is the MAC address of VLANIF 4 corresponding
to super-VLAN 4.
PC1 learns S2s MAC address.
PC1 sends the ARP Reply packet to S2. The ARP
Reply packet carries the destination MAC address as
the MAC address of VLANIF 4 corresponding to superVLAN 4 and the destination IP address of 1.1.3.2.

After receiving the ARP Reply packet, S2 performs


Layer 3 forwarding and sends the ARP Reply packet to
S1, with the next hop address of 1.1.2.2 and outbound
interface as VLANIF 10.
After receiving the ARP Reply packet, Switch2
performs Layer 3 forwarding and sends the ARP Reply
packet to PC3 through the directly connected interface
VLANIF 20.
The ARP Reply packet from PC3 reaches S2 after
Layer 3 forwarding on S1.
After receiving the ARP Reply packet, S2 performs
Layer 3 forwarding and sends the packet to PC1
through the super-VLAN.

The MUX VLAN falls into the principal VLAN and subordinate VLAN.
The subordinate VLAN is classified into the separate VLAN and group
VLAN.
Principal VLAN: A principal interface can communicate with all
interfaces in a MUX VLAN.
Subordinate VLAN
Separate VLAN: A separate interface can communicate
only with a principal interface and is isolated from other
types of interfaces. A separate VLAN must be bound to
a principal VLAN.
Group VLAN: A group interface can communicate with
a principal interface and other interfaces in the same
group VLAN, but cannot communicate with interfaces
in other group VLANs or a separate interface. A group
VLAN must be bound to a principal VLAN.

Topology description
The principal interface connects to the enterprise server;
separate interfaces connect to enterprise customers; group
interfaces connect to enterprise employees. In this manner,
enterprise customers and enterprise employees can access
the enterprise server, enterprise employees can communicate
with each other, enterprise customers cannot communicate
with each other, and enterprise customers and enterprise
employees cannot communicate with each other.

Case description
To meet requirement 2, configure VLAN 2 and VLAN 3 to be
permitted by the trunk link.

Command usage
The port link-type command sets the link type of an interface.
The port trunk allow-pass vlan command adds a trunk
interface to VLANs.
The port hybrid untagged vlan command adds a hybrid
interface to VLANs. Frames of the VLANs then pass through
the hybrid interface in untagged mode.
View
Interface view
Parameters
port link-type { access | dot1q-tunnel | hybrid | trunk }
Access: configures the link type of an interface as
access.
dot1q-tunnel: configures the link type of an interface as
QinQ.
hybrid: configures the link type of an interface as hybrid.
trunk: configures the link type of an interface as trunk.
Precautions
Before changing the link type of an interface, you need to
delete the VLAN configuration of the interface. That is, the
interface can join only VLAN 1.
If a specified VLAN does not exist, the port trunk allow-pass
vlan command does not take effect. The port trunk allowpass vlan command cannot be used on a member interface of
an Eth-Trunk.

A hybrid interface can connect to either a user host or a switch.


When a hybrid interface is connected to a user host, it must be
added to VLANs in untagged mode because user hosts cannot
process untagged frames. The port hybrid untagged vlan
command is invalid on a member interface of an Eth-Trunk. A
super VLAN cannot be specified in the port hybrid untagged
vlan command.

Case description
The topology is similar to that in slide 22. The difference is that
MAC addresses are identified. Assign VLANs based on MAC
addresses to meet the requirement.
Before configuring MAC address-based VLAN assignment,
ensure that the link type of the Layer 2 interface is hybrid.

Command usage
The mac-vlan mac-address command associates a MAC
address with a VLAN.
The mac-vlan enable command enables MAC addressbased VLAN assignment on an interface.
Precautions
After a MAC address is associated with a VLAN, it cannot
be associated with other VLANs.
If MAC address-based assignment is enabled on an
interface:
When receiving an untagged packet, the interface
searches for the VLAN entry matching the source MAC
address of the packet. If a matching entry is found, the
interface forwards the packet based on the VLAN ID. If no
matching entry is found, the interface uses other
matching rules to forward the packet.
When receiving a tagged packet, the interface forwards
the packet based on the interface-based VLAN
assignment configuration.
MAC address-based assignment can be configured only
on hybrid interfaces.

Case description
The topology is similar to that in slide 22.
Before configuring IP subnet-based VLAN assignment, ensure
that the link type of the Layer 2 interface is hybrid.

Command usage
The ip-subnet-vlan command associates an IP subnet
with a VLAN.
The ip-subnet-vlan enable command enables IP subnetbased VLAN assignment on an interface.
Precautions
The ip-subnet-vlan command associated with a VLAN
cannot be a multicast network segment or multicast
address.
IP subnet-based assignment can be configured only on
hybrid interfaces.

Case description
Protocol-based assignment can be configured only on
hybrid interfaces.

Command usage
The protocol-vlan command associates a protocol with a
VLAN.
The protocol-vlan vlan command associates an interface with
a protocol-based VLAN.
Precautions
Protocol-based assignment can be configured only on hybrid
interfaces.
When protocol-based assignment is used on an interface, the
switch needs to parse the protocol type in the received packet
and convert it.

Case description
You can use the VLANIF interface or sub-interface to
implement communication between VLANs.

Command usage
The interface vlanif command creates a VLANIF interface
and displays the VLANIF interface view.
The dot1q termination vid command configures the single
VLAN ID of dot1q encapsulation on a sub-interface.
The arp broadcast enable command enables ARP
broadcast on a sub-interface.
Precautions
Before running the interface vlanif command, you must run
the vlan command to create a VLAN specified by vlan-id.

Case description
Configure VLAN aggregation to meet the requirements.

Command usage
The aggregate-vlan command configures a VLAN as a
super-VLAN.
The access-vlan command adds one or more sub-VLANs
to a super-VLAN.
Precautions
VLAN 1 cannot be configured as a super-VLAN.
The super-VLAN must be different from all its sub-VLANs.
A VLAN can be added to only one super-VLAN.

Case description
Configure the MUX VLAN to meet the requirements.

Command usage
The mux-vlan command configures a VLAN as a principal
VLAN.
The subordinate group command configures subordinate
group VLANs for a principal VLAN.
The subordinate separate command configures a
subordinate separate VLAN for a principal VLAN.
Precautions for the principal VLAN
The super-VLAN, sub-VLAN, or subordinate VLAN cannot be
configured as a principal VLAN.
The VLAN where a VLANIF interface has been created cannot
be configured as a principal VLAN.
Precautions for the subordinate group VLAN
Before configuring a subordinate group VLAN, you must
configure a principal VLAN and enter the principal VLAN view.
The VLAN to be configured as a subordinate group VLAN
must have been created.
The VLAN to be configured as a subordinate group VLAN
cannot have a VLANIF interface configured or be configured
as a super-VLAN.
Before running the undo subordinate group command delete
a subordinate group VLAN to which interfaces have been
added, delete the interfaces from the subordinate group VLAN.
A subordinate group VLAN must be different from the principal
VLAN.

A subordinate group VLAN must be different from a


subordinate separate VLAN.
Precautions for the subordinate separate VLAN
Before configuring a subordinate separate VLAN, you must
configure a principal VLAN and enter the principal VLAN view.
The VLAN to be configured as a subordinate separate VLAN
must have been created.
The VLAN to be configured as a subordinate separate VLAN
cannot have a VLANIF interface configured or be configured
as a super-VLAN.
Before running the undo subordinate separate command
delete a subordinate separate VLAN to which interfaces have
been added, delete the interfaces from the subordinate
separate VLAN.
A subordinate separate VLAN must be different from the
principal VLAN.
A subordinate separate VLAN must be different from a
subordinate group VLAN.

Check whether MAC address entries on the switch are correct.


Run the display mac-address command on the switch to
check whether the MAC addresses, interfaces, and VLANs in
the learned MAC address entries are correct. If the learned
MAC address entries are incorrect, run the undo macaddress mac-address vlan vlan-id command on the interface
to delete the existing entries so that the switch can learn MAC
address entries again.

Case description
To implement communication between VLANs through RIPv2,
configure at least two VLANIF interfaces on the switch.

Result
Perform the ping operation. PC1 in VLAN 2 and VLAN 3 can
communicate with each other.

Result
To implement communication between VLANs through RIPv2,
configure at least two VLANIF interfaces on the switch.

Proxy ARP
Routed proxy ARP: Routed proxy ARP enables network
devices on the same network segment but on different
physical networks to communicate.
Intra-VLAN proxy ARP: If two hosts belong to the same VLAN
where user isolation is configured, enable intra-VLAN proxy
ARP on an interface associated with the VLAN to allow the
hosts to communicate.
Inter-VLAN proxy ARP: If two hosts belong to different VLANs,
enable inter-VLAN proxy ARP on interfaces associated with
the VLANs to implement Layer 3 communication between the
two hosts.
Topology Description
Routed proxy ARP
The IP addresses of PC1 and PC2 are on the same
network segment. When PC1 needs to communicate
with S1, PC1 broadcasts an ARP Request packet,
requesting the MAC address of PC2. However, PC1
and PC2 are on different physical networks (in different
broadcast domains). PC2 therefore cannot receive the
ARP Request packet sent from PC1 and does not
respond with an ARP Reply packet. To solve this
problem, enable proxy ARP on S1.

After receiving the ARP Request packet, S1 searches


for a routing entry corresponding to PC2. If the routing
entry corresponding to PC2 exists, S1 responds to the
ARP Request packet with its own MAC address. PC1
forwards data based on the MAC address of S1. S1
functions as the proxy of PC2.
Intra-VLAN proxy ARP
PC1 cannot communicate with PC2 in the same VLAN
because interface isolation is configured on the
interface of S1 connected to PC1 and PC2. To solve
this problem, enable intra-VLAN proxy ARP on the
interfaces of S1. After S1's interface connected to PC1
receives an ARP Request packet destined for another
address, S1 does not discard the packet but searches
for the ARP entry corresponding to PC2. If the ARP
entry corresponding to PC2 exists, S1 sends its MAC
address to PC1 and forwards packets sent from PC1 to
PC2. S1 functions as the proxy of PC2.
Inter-VLAN proxy ARP
This function is used in VLAN aggregation. Refer to the
VLAN documentation.

Gratuitous ARP provides the following functions:


Checks for duplicate IP addresses: Normally, a host does not
receive an ARP Reply packet after sending an ARP Request
packet with the destination address as its own IP address. If
the host receives an ARP Reply packet, another host has the
same IP address.
Advertises a new MAC address: If the MAC address of a host
changes because its network adapter is replaced, the host
sends a gratuitous ARP packet to notify all hosts of the change
before the ARP entry is aged out.
Notifies of an active/standby switchover in a VRRP group:
After an active/standby switchover is performed, the master
switch sends a gratuitous ARP packet in the VRRP group to
notify of the switchover.

After the system is reset or the interface card is hot swapped or reset,
the dynamic entries will be lost but the static and the blackhole entries
are not lost.

Secure MAC addresses are classified into the following types:


Secure dynamic MAC address: is learned on an
interface where port security is enabled but the sticky
MAC function is disabled. After port security is enabled
on an interface, dynamic MAC address entries that
have been learned on the interface are deleted and
MAC address entries learned subsequently turn into
secure dynamic MAC address entries. Secure dynamic
MAC addresses will not be aged out by default. After
the switch restarts, secure dynamic MAC addresses
are lost and need to be learned again.
Sticky MAC address: is learned on an interface where
both port security and the sticky MAC function are
enabled. Sticky MAC addresses will not be aged out.
After you save the configuration and restart the switch,
sticky MAC addresses still exist.

MAC address anti-flapping


Increasing the MAC address learning priority of an interface:
When the same MAC address entry is learned by interfaces
with different priorities, the MAC address entry learned by the
interface with the highest priority overwrites the one learned by
other interfaces.
Preventing MAC address overwriting on interfaces with the
same priority: If the priority of an interface on a bogus device is
the same as that on the authorized device, the MAC address
of the bogus device learned later does not overwrite the
correct MAC address. If the device powers off, the MAC
address of the bogus device is learned. After the device
powers on again, the device cannot learn the correct MAC
address.
Topology description
You can set a high MAC address learning priority on Port1 to
prevent PC3 from using the MAC address of PC1 to attack the
switch.

Topology description
No loop prevention protocol is used on the switching network.
If S2 and S4 are incorrectly connected with a network cable, a
loop occurs between S2, S3, and S4. When a broadcast
packet is sent, the packet is forwarded to S3 and received by
Port1 on S1. When MAC address flapping detection is
configured on Port1, S1 detects that the source MAC address
of the broadcast packet flaps between interfaces. If the MAC
address flaps between interfaces frequently, S1 considers that
MAC address flapping occurs. The interface associated with
S1 can enter the error-down state or be removed from the
VLAN.
MAC address flapping detection
Other dynamic VLAN technologies cannot be used with the
removal of an interface from the VLAN where MAC address
flapping occurs.

Link aggregation has the following advantages:


Increased bandwidth: The bandwidth of the link aggregation
interface is the sum of bandwidth of member interfaces.
Higher reliability: When the physical link of a member interface
fails, the traffic can be switched to another available member
link, improving reliability of the link aggregation interface.
Load balancing: In a Link Aggregation Group (LAG), traffic is
load balanced among active member interfaces.
Basic concepts of Ethernet link aggregation
Eth-Trunk: An LAG is the logical link bundled by many
Ethernet links, and is short for Eth-Trunk.
Member interfaces and member links: The interfaces that
constitute an Eth-Trunk are member interfaces. The link
corresponding to a member interface is member link.
Active and inactive interfaces and links:
Member interfaces are classified into active interfaces
that forward data and inactive interfaces that do not
forward data.
Links connected to active interfaces are called active
links, and links connected to inactive interfaces are
called inactive links.

Upper threshold for the number of active interfaces: This


setting guarantees higher network reliability. When the number
of active member interfaces reaches the upper threshold,
additional active member interfaces are set to Down and used
as backup links.
Lower threshold for the number of active interfaces: This
setting ensure the minimum bandwidth of an Eth-Trunk. When
the number of active interfaces falls below this threshold, the
Eth-Trunk goes Down.

Forwarding principle
An Eth-Trunk interface is assumed to be a physical interface at
the MAC sub-layer. Therefore, frames transmitted at the MAC
sub-layer only need to be delivered to the Eth-Trunk module.

Eth-Trunk forwarding entries:


HASH-KEY value: is calculated through the hash algorithm on
the MAC address or IP address in the packet.
Interface number: Eth-Trunk forwarding entries are relevant to
the number of member interfaces in an Eth-Trunk. Different
HASH-KEY values are mapped to different outbound
interfaces.
Figure description
For example, If three physical interfaces, 1, 2, and 3, are
bundled into an Eth-Trunk, the Eth-Trunk forwarding table
contains three entries, as shown in the preceding figure. In the
Eth-Trunk forwarding table, the HASH-KEY values are 0, 1, 2,
3, 4, 5, 6, 7, and the corresponding interface numbers are 1, 2,
3, 1, 2, 3, 1, 2.

Forwarding process
The Eth-Trunk module receives a frame from the MAC sublayer, and then extracts its source MAC address/IP address or
destination MAC address/IP address according to the load
balancing mode.
The Eth-Trunk module calculates the HASH-KEY value using
the hash algorithm.
Based on the HASH-KEY value, the Eth-Trunk module
searches the Eth-Trunk forwarding table for the interface
number, and then sends the frame from the corresponding
interface.

Mis-sequencing in common load balancing mode


Because there are multiple physical links between devices of
an Eth-Trunk, the first data frame of the same data flow is
transmitted on one physical link, and the second data frame
may be transmitted on another physical link. In this case, the
second data frame may arrive at the peer device earlier than
the first data frame. As a result, packet mis-sequencing occurs.
Eth-Trunk load balancing
The Eth-Trunk uses the load balancing mechanism. This
mechanism uses the hash algorithm to calculate the address
in a data frame and generates a hash key value. The system
then searches for the outbound interface in the Eth-Trunk
forwarding table based on the generated hash key value. Each
MAC or IP address corresponds to a hash key value, so the
system uses different outbound interfaces to forward data.
This mechanism ensures that frames of the same data flow
are forwarded on the same physical link and implements flowbased load balancing. Flow-based load balancing ensures the
sequence of data transmission, but cannot guarantee the
bandwidth use efficiency.

Manual load balancing mode


If an active link fails, the other active links load balance the
traffic evenly. If a high link bandwidth between two directly
connected devices is required but the device does not support
the LACP protocol, you can use the manual load balancing
mode.
LACP mode
LACP uses a standard negotiation mechanism for switching
devices. LACP enables switching devices to automatically
create and enable aggregated links based on their
configurations. After aggregated links are created, LACP
maintains the link status. If an aggregated link's status
changes, LACP automatically adjusts or disables the link
aggregation.

LACP concepts
LACP system priority: The LACP system priority (default value
of 32768) is used to differentiate priorities of devices at both
ends of an Eth-Trunk. In LACP mode, active interfaces
selected by both devices must be consistent; otherwise, the
LAG cannot be established. To keep active interfaces
consistent at both ends, set a higher priority for one end.

In this manner, the other end selects active member


interfaces based on the selection of the peer. The smaller the
LACP system priority value, the higher the LACP system
priority. When LACP system priorities are the same, the device
with smaller MAC address functions as the Actor.
LACP interface priority: The LACP interface priority (default
value of 32768) is used to determine whether a member
interface can be selected as an active interface. The smaller
the LACP interface priority value, the higher the LACP
interface priority.
In LACP mode, LACP determines active and inactive links in
an LAG. This mode is also called M:N mode, where M refers to
the number of active links and N refers to the number of
backup links. This mode guarantees high reliability and allows
load balancing to be carried out across M active links.

LACP implementation
After member interfaces are added to an Eth-Trunk in LACP
mode, each end sends LACPDUs to inform its peer of its
system priority, MAC address, interface priority, interface
number, and keys. After being informed, the peer compares
this information with that saved on itself, and selects which
interfaces to be aggregated. Both ends determine active
interfaces and links.
Negotiation process
Devices at both ends send LACPDUs to each other.
Create an Eth-Trunk in LACP mode on S1 and S2 and
add member interfaces to the Eth-Trunk. The member
interfaces are then enabled with LACP, and devices at
both ends send LACPDUs to each other.
Determine the Actor and active links.
When S2 receives LACPDUs from S1, S2 checks and
records information about S1 and compares system
priorities. If the system priority of S1 is higher than that
of S2, S1 acts as the Actor.
After devices at both ends select the Actor, they select
active interfaces according to the priorities of the
Actor's interfaces.

LACP preemption
E1 becomes faulty, and then recovers. When E1 fails,
E3 replaces E1 to transmit services. After E1 recovers,
if LACP preemption is not enabled on the Eth-Trunk,
E1 still retains a backup state. If LACP preemption is
enabled on the Eth-Trunk, E1 becomes the active
interface and E3 becomes the backup interface
because E1 has higher priority than E3.
LACP preemption delay
When LACP preemption occurs, the backup link waits
for a given period of time before switching to the active
state.

GVRP
GVRP is based on GARP and is used to maintain VLAN
attributes dynamically on devices. Through GVRP, VLAN
attributes of one device can be propagated throughout the
entire switching network. GVRP enables network devices to
dynamically deliver, register, and propagate VLAN attributes,
reducing the workload of network administrators and ensuring
correct configuration.
GVRP applies to only trunk links.
GVRP uses the multicast MAC address of 01-80-C2-00-00-21.
Participant
On a device running GVRP, each GVRP-enabled port is
considered as a GVRP participant.
VLAN registration and deregistration
GVRP implements automatic registration and deregistration of
VLAN attributes.
VLAN registration: adds an interface to a VLAN.
VLAN deregistration: removes an interface from a
VLAN.
GVRP registers and deregisters VLAN attributes through
attribute declarations and reclaim declarations:
When an interface receives a VLAN attribute
declaration, it registers the VLAN specified in the
declaration.

That is, the interface is added to the VLAN.


When an interface receives a VLAN attribute reclaim
declaration, it deregisters the VLAN specified in the
declaration. That is, the interface is removed from the
VLAN.

GARP participants exchange attribute information by sending


messages. GVRP messages fall into Join, Leave, and LeaveAll
messages.
Join message: When a GARP participant requires that other
devices register its attributes, receives Join messages from
other GARP participants, or have attributes configured
statically, it sends Join messages.
Leave message: A GARP participant sends Leave messages
to have its attributes deregistered from other devices. The
GARP participant also sends Leave messages when
receiving Leave messages from other GARP participants or
when attributes are manually deregistered.
LeaveAll message: A GARP participant sends LeaveAll
messages to deregister all its attributes from all the other
GARP participants. LeaveAll messages are used to
periodically delete garbage attributes. For example, a garbage
attribute may be created when a device fails to send a Leave
message, due to sudden loss of power, that is used to notify
other devices to deregister an attribute that it has removed.

Join timer
To ensure that a Join message is reliably transmitted to
other GARP participants, a GARP participant may send the
Join message twice. When sending the first Join message,
the GARP participant starts the Join timer. If a Join
message is received before the Join timer expires, the
GARP participant does not send the second Join message.
If not, the GARP participant re-sends the Join message.
The Join timer is configured on a per-port basis.
Hold timer
When you configure an attribute on a participant or when
the participant receives a request message, the
participant does not propagate the message to the other
devices immediately. Instead, it sends the request
messages received within a period of time and sends
them in one GARP PDU. This period of time is specified by
the Hold timer. By making full use of the data portion of
GARP PDUs to send multiple messages in one packet, the
mechanism reduces the number of transmitted packets
and contributes to network stability.
The Hold timer value must be no greater than half of the
Join timer value.

Leave timer
Upon receiving a Leave or LeaveAll message, a GARP
participant starts its Leave timer. If it receives no Join message
containing the attribute carried in the Leave or LeaveAll
message when the Leave timer expires, it deregisters the
attribute.
The Leave timer value is twice that of the Join timer value.
LeaveAll timer
Upon startup, a GARP participant starts the LeaveAll timer.
When the LeaveAll timer expires, the GARP participant sends
out a LeaveAll message, and then restarts the LeaveAll timer
to start another cycle.
When receiving a LeaveAll message, a GARP participant restarts all timers, including the LeaveAll timer.
If LeaveAll timers of multiple devices expire at the same time,
multiple LeaveAll messages will be sent at the same time,
creating unnecessary traffic. To avoid this problem, the actual
LeaveAll timer value of a participant is a random value
between the LeaveAll timer value and the LeaveAll timer value
multiplied by 1.5. A LeaveAll event is equivalent to
deregistering all attributes network wide by sending Leave
messages.
The LeaveAll timer value must be at least larger than the
Leave timer value.

One-way registration of VLAN attributes


Manually create static VLAN 2 on S1. In response to this
action, GVRP automatically assigns the GVRP-enabled ports
on S2 and S3 to VLAN 2 through one-way registration. The
process is as follows:
After VLAN 2 is created on S1, E1 on S1 starts the Join
timer and Hold timer. When the Hold timer expires, S1
sends the first JoinEmpty message to S2. When the
Join timer expires, E1 restarts the Hold timer. When the
Hold timer expires again, Port 1 sends the second
JoinEmpty message.
After E2 on S2 receives the first JoinEmpty message,
S2 creates dynamic VLAN 2 and adds E2 to VLAN 2.
In addition, S2 requests E3 to start the Join timer and
Hold timer. When the Hold timer expires, E3 sends the
first JoinEmpty message to S3. When the Join timer
expires, E3 restarts the Hold timer. When the Hold
timer expires again, E3 sends the second JoinEmpty
message. After E2 receives the second JoinEmpty
message, S2 does not take any action because E2 has
been added to VLAN 2.

After E4 of S3 receives the first JoinEmpty message,


S3 creates dynamic VLAN 2 and adds E4 to VLAN 2.
After E4 receives the second JoinEmpty message, S3
does not take any action because E4 has been added
to VLAN 2.
Every time the LeaveAll timer expires or a LeaveAll
message is received, each device restarts the LeaveAll
timer, Join timer, Hold timer, and Leave timer. E1 then
repeats step 1 to send JoinEmpty messages. E3 of S2
sends JoinEmpty messages to S3 in the same way.

Two-way registration of VLAN attributes


After one-way registration is complete, E1, E2, and E4 are
added to VLAN 2 but E3 is not added to VLAN 2 because only
interfaces receiving a JoinEmpty or JoinIn message can be
added to dynamic VLANs. To transmit traffic of VLAN 2 in both
directions, VLAN registration from S3 to S1 is required. The
process is as follows:
After one-way registration is complete, static VLAN 2 is
created on S3 (the dynamic VLAN is replaced by the
static VLAN). E4 on S3 starts the Join timer and Hold
timer. When the Hold timer expires, E4 on S3 sends
the first JoinIn message (because it has registered
VLAN 2) to S2. When the Join timer expires, E4
restarts the Hold timer. When the Hold timer expires,
E4 sends the second JoinIn message.
After E3 on S2 receives the first JoinIn message, S2
adds E3 to VLAN 2 and requests E2 to start the Join
timer and Hold timer. When the Hold timer expires, E2
sends the first JoinIn message to S1. When the Join
timer expires, E2 restarts the Hold timer. When the
Hold timer expires again, E2 sends the second JoinIn
message. After E3 receives the second JoinIn
message, S2 does not take any action because E3 has
been added to VLAN 2.
When S1 receives the JoinIn message, it stops sending
JoinEmpty messages to S2. Every time the LeaveAll
timer expires or a LeaveAll message is received, each
device restarts the LeaveAll timer, Join timer, Hold
timer, and Leave timer. E1 on S1 sends a JoinIn
message to S2 when the Hold timer expires.
S2 sends a JoinIn message to S3.
After receiving the JoinIn message, S3 does not create
dynamic VLAN 2 because static VLAN 2 has been
created.

One-way deregistration of VLAN attributes


When VLAN 2 is not required on devices, the devices can
deregister VLAN 2. The process is as follows:
After static VLAN 2 is manually deleted from S1, E1 on
S1 starts the Hold timer. When the Hold timer expires,
E1 sends a LeaveEmpty message to S2. E1 needs to
send only one LeaveEmpty message.
After E2 on S2 receives the LeaveEmpty message, it
starts the Leave timer. When the Leave timer expires,
E2 deregisters VLAN 2. Then E2 is deleted from VLAN
2, but VLAN 2 is not deleted from S2 because E3 is still
in VLAN 2. At this time, S2 requests E3 to start the
Hold timer and Leave timer. When the Hold timer
expires, E3 sends a LeaveIn message to S3. Static
VLAN 2 is not deleted from S3, so E3 can receive the
JoinIn message sent from E4 after the Leave timer
expires. In this case, S1 and S2 can still learn dynamic
VLAN 2.
After S3 receives the LeaveIn message, E4 is not
deleted from VLAN 2 because VLAN 2 is a static VLAN
on S3.
Two-way deregistration of VLAN attributes
To delete VLAN 2 from all devices, two-way deregistration is
required. The process is as follows:

After static VLAN 2 is manually deleted from S3, E4 on


S3 starts the Hold timer. When the Hold timer expires,
E4 sends a LeaveEmpty message to S2.
After E3 on S2 receives the LeaveEmpty message, it
starts the Leave timer. When the Leave timer expires,
E3 deregisters VLAN 2. Then E3 is deleted from
dynamic VLAN 2, and dynamic VLAN 2 is deleted from
S2. At this time, S2 requests E2 to start the Hold timer.
When the Hold timer expires, E2 sends a LeaveEmpty
message to S1.
After E1 on S1 receives the LeaveEmpty message, it
starts the Leave timer. When the Leave timer expires,
E1 deregisters VLAN 2. Then E1 is deleted from
dynamic VLAN 2, and dynamic VLAN 2 is deleted from
S1.

Manually configured VLANs are called static VLANs and VLANs


created using GVRP are called dynamic VLANs.

Case description
To enable PC1 and PC2 whose interfaces are isolated in
VLAN 2 to communicate with each other, enable intra-VLAN
proxy ARP on S1.

Command usage
The port-isolate enable command enables port isolation.
The arp-proxy inner-sub-vlan-proxy enable command
enables intra-VLAN proxy ARP.

View
Interface view
Parameters
port-isolate enable [ group group-id ]
group-id: specifies the ID of a port isolation group. The
default value is 1.
Precautions
You can use the display port-isolate command to view the
port isolation group configuration.

Case description
Preemption needs to be enabled to meet requirement 3.

Command usage
The mode command configures the working mode of an EthTrunk.
The eth-trunk command adds an interface to an Eth-Trunk.
The load-balance command sets a load balancing mode of an
Eth-Trunk.
The max active-linknumber command sets the upper
threshold for the number of active member links on an EthTrunk.
The lacp priority command sets the LACP system or interface
priority.
The lacp preempt enable command enables priority
preemption in static LACP mode.
Precautions
When adding an interface to an Eth-Trunk, pay attention to the
following points:
An Eth-Trunk contains a maximum of 8 member
interfaces.
A member interface cannot be configured with any
service or static MAC address.
The link type of the member interface added to the EthTrunk must be hybrid.

An Eth-Trunk cannot be nested, that is, its member


interface cannot be an Eth-Trunk.
An Ethernet interface can be added to only one EthTrunk. To add the Ethernet interface to another EthTrunk, delete it from the original Eth-Trunk first.
Member interfaces of an Eth-Trunk must be of the
same type. That is, FE and GE interfaces cannot join
the same Eth-Trunk.
Ethernet interfaces on different LPUs can join the same
Eth-Trunk.
The remote interface directly connected to the local
Eth-Trunk member interface must also be bundled into
an Eth-Trunk; otherwise, the two ends cannot
communicate.
When member interfaces use different rates,
congestion may occur on the low-rate interface,
causing packet loss.
After interfaces are added to an Eth-Trunk, MAC
addresses are learned on the Eth-Trunk but not the
member interfaces.
When all member interfaces of an Eth-Trunk work in
half-duplex mode, the Eth-Trunk cannot negotiate an
Up state.

Case description
Deploy GVRP to meet requirement 2.

Command usage
The gvrp command enables GVRP globally or on an interface.
Precautions
Before enabling GVRP on an interface, you must set the link
type of the interface to trunk.
The display gvrp vlan-operation command displays the
dynamic VLANs to which an interface is added.

PPP includes three protocols:


Link Control Protocol (LCP): is used to establish, monitor, and
tear down PPP data links. LCP can automatically detect the
link environment, for example, check whether there are loops.
It also negotiates link parameters such as the maximum
packet length and authentication protocol to be used.
Compared with other data link layer protocols, PPP has an
important feature, that is, it can provide the authentication
function. The two ends of a link can negotiate the
authentication protocol to be used and implement
authentication. The ends can be connected only when the
authentication succeeds. Due to this feature, PPP is
appropriate for carriers to provide access to distributed users.
Network Control Protocol (NCP): is used to negotiate the
format and type of packets transmitted on data links. For
example, IP Control Protocol (IPCP) and Internetwork Packet
Exchange Control Protocol (IPXCP) are used to control
parameter negotiation of IP and IPX packets respectively.
PPP extensions: give PPP support functions. For example,
PPP extensions provide the Password Authentication Protocol
(PAP) and Challenge Handshake Authentication Protocol
(CHAP) to ensure network security.

PPP packet format


Flag field

The Flag field identifies the start and end of a physical


frame and is always 0x7E.
Address field

The Address field identifies a peer. Two communicating


devices connected by using PPP do not need to know the
data link layer address of each other because PPP is used
on P2P links. This field must be filled with a broadcast
address of all 1s and is of no significance to PPP.

Control field
The Control field value defaults to 0x03, indicating
an unsequenced frame. By default, PPP does not
use sequence numbers or acknowledgement
mechanisms to ensure transmission reliability.
The Address and Control fields identify a PPP
packet, so the PPP packet header value is FF03.

Protocol field
The Protocol field identifies the datagram
encapsulated in the Information field of a PPP data
packet.
LCP packet format
Code field
The Code field is 1 byte in length and identifies the
LCP packet type.

Identifier field
The Identifier field is 1 byte long. It is used to match
request and response packets. If a device receives a
packet with an invalid Identifier field, the device
discards the packet.
The sequence number of a Configure-Request
packet usually begins with 0x01 and increases by 1
each time a Configure-Request packet is sent. After
a receiver receives a Configure-Request packet, it
must send a response packet with the same
sequence number as that of the received ConfigureRequest packet.
Length field
The Length field specifies the total number of bytes
in the LCP packet. It specifies the length of an LCP
packet, including the Code, Identifier, Length and
Data fields.
The Length field value cannot exceed the maximum
receive unit (MRU) of the link. Bytes outside the
range of the Length field are treated as padding and
are ignored after they are received.
Data field
The Type field specifies the negotiation option type.
The Length field specifies the total length of the Data
field, including Type, Length, and Data.
The Data field contains the contents of the
negotiation option.

The PPP link establishment process is as follows:


Dead: PPP starts and ends with the Dead phase. After the physical
status of two communicating devices becomes Up (marked as UP in
the figure), PPP enters the Establish phase.
Establish: The two devices negotiate link layer parameters in the
Establish phase. If negotiation of link layer parameters fails (marked as
FAIL in the figure), a PPP connection cannot be established and PPP
returns to the Dead phase. If negotiation of link layer parameters
succeeds (marked as OPENED in the figure), PPP enters the
Authenticate phase.
Authenticate: In the Authenticate phase, the authenticating party
authenticates the authenticated party. If authentication fails (marked as
FAIL in the figure), PPP enters the Terminate phase. If authentication
succeeds (marked as SUCCESS in the figure) or none authentication is
configured, PPP enters the Network phase.
Network: In the Network phase, the two devices use NCP to negotiate
network-layer parameters. If negotiation succeeds, a PPP connection
can be established and data packets can be transmitted over the PPP
connection. When the upper-layer protocol determines that the PPP
connection (for example, an on-demand circuit)should be disconnected
or an administrator manually disconnects the PPP connection, PPP
enters the Terminate phase.
Terminate: In the Terminate phase, the two devices use LCP to
disconnect the PPP connection. After the PPP connection is
disconnected (marked as Down in the figure), PPP enters the Dead
phase.

Note: The working phases of PPP listed in this slide are not the status
of the PPP protocol because PPP is a protocol suite that does not have
a protocol status. Only specified protocols such as LCP and NCP can
have a protocol status that can change from one state to another.

3 Type packets of LCP Protocal:


1.Link configure packet, used to establish and configure links:
Configure-Request, Configure-Ack, Configure-Nak, Configure-Reject.
2.Link disconnection packet, used to end links: Terminate-Request,
Terminate-Ack.
3.Link maintenance packet, used to management and debug links:
Code-Reject, Protocol-Reject, Echo-Request, Echo-Reply, DiscardRequest.

LCP is used to negotiate the following parameters:


MRU is used on the Versatile Routing Platform (VRP) to indicate the
maximum transmission unit configured on an interface.
The PPP authentication protocols include PAP and CHAP. Two ends
of a PPP link can use different protocols to authenticate the peer.
However, the authenticated party must support the authentication
protocol used by the authenticating party and have authentication
information such as the user name and password correctly configured.
LCP uses the magic number to detect link loops and other exceptions.
A magic number is a randomly generated digit. It should be ensured
that the two ends do not generate the same magic number.
After a device receives a Configure-Request packet, it compares the
magic number in the Configure-Request packet received with the
locally generated magic number. If they are different, link loops do not
occur and the device sends a Confugure-Ack packet (if other
parameters are successfully negotiated) to indicate that negotiation of
the magic number succeeds. If subsequent packets contain the MagicNumber field, the value of the field is set to the successfully negotiated
magic number and LCP does not generate a new magic number.
If the magic number in the Configure-Request packet received is the
same as that received previously, the receiver sends a Confugure-Nak
packet to the sender, carrying a new magic number. The sender sends
a new Configure-Request packet carrying a new magic number,
regardless of whether the magic number in the Configure-Nak packet
received is the same as that . If a link loop exists, the process persists.
If no link loop exists, packet exchange will soon be restored.

Link negotiation success:


As shown in the figure, R1 and R2 are connected in series and run
PPP. When the physical status of the link becomes Up, R1 and R2
use the LCP to negotiate link layer parameters. In this example, R1
sends an LCP packet.
R1 sends a Configure-Request packet to R2, carrying link-layer
parameters configured on the sender (R1). The link-layer
parameters use the Type, Length, Value structure.
After receiving the Configure-Request packet, R2 sends a
Configure-Ack packet to R1 if it can identify all the link-layer
parameters in the packet and determines that the value of each
parameter is acceptable.
If R1 does not receive a Configure-Ack packet, it re-transmits a
Configure-Request packet once every 3 seconds. If R1 still cannot
receive a Configure-Ack packet after the Configure-Request packet
is re-transmitted for 10 consecutive times, it determines that the
peer is unavailable and stops sending Configure-Request packets.
Note: After the process is complete, R2 determines that the link-layer
parameters configured on R1 are acceptable. R2 also needs to
send Configure-Request packets to R1, so that R1 can determine
whether the link-layer parameters configured on R2 are acceptable.

Link negotiation failure:


After R2 receives a Configure-Request packet from R1, R2 sends a
Configure-Nak packet to R1 if R2 can identify all the link-layer
parameters in the packet, but determines that all or some of the
parameter values are unacceptable, indicating that parameter
negotiation fails.
The Configure-Nak packet contains only the parameters whose
values are unacceptable, and the value of each parameter is changed
to a value or value range that is acceptable on R2.
After receiving the Configure-Nak packet, R1 changes the parameter
values used locally based on the values in the Configure-Nak packet,
and then sends a Configure-Request packet.
If negotiation still fails after the Configure-Request packet is sent for
five consecutive times, the parameters are disabled and parameter
negotiation stops.

The link negotiation parameters cannot be identified.


After receiving a Configure-Request packet from R1, R2 sends a
Configure-Reject packet to R1 if R2 cannot identify all or some linklayer parameters in the packet.
The Configure-Reject packet contains only the parameters that
cannot be identified.
After receiving the Configure-Reject packet, R1 sends a ConfigureRequest packet to R2, carrying only parameters that can be identified
by R2.

The link state detection process is as follows:


After a connection is set up using LCP, Echo-Request and EchoReply packets can be used to detect the link status. If a device
replies an Echo-Reply packet each time it receives an EchoRequest packet, the link status is normal.
By default, the VRP platform sends an Echo-Request packet once
every 10 seconds.

The process of tearing down a connection is as follows:


LCP can tear down an existing connection if the authentication fails or
an administrator manually shuts down the connection.
LCP uses Terminate-Request and Terminate-Ack packets to
disconnect a connection. The Terminate-Request packet is used to
request the peer to disconnect the connection. After receiving a
Terminate-Request packet, the device replies a Terminate-Ack packet
to confirm that the connection is to be disconnected.
If a device fails to receive a Terminate-Ack packet, it re-transmits a
Terminate-Request packet once every 3 seconds. If the device still
does not receive a Terminate-Ack packet after sending the TerminateRequest packet twice consecutively, it determines that the peer is
unavailable, and then disconnects the connection.

A PAP packet is encapsulated in the PPP packet directly.

The PAP authentication process is as follows:


The authenticated party sends an Authenticate-Request
packet carrying the user name and password in plaintext to
the authenticating party. In this example, the user name
and password are huawei and hello.
After receiving the user name and password from the
authenticated party, the authenticating party compares the
user name and password with those configured locally to
check whether they are correct. If the user name and
password are correct, the authenticating party returns an
Authenticate-Ack packet, indicating that the authentication
succeeds. If the user name and password are incorrect, the
authenticating party returns an Authenticate-Nak packet,
indicating that the authentication fails.

The encryption algorithm Message Digest 5 (MD5) is used to calculate


a 16-byte character string, which is the concatenation of
Identifier+password+challenge. The authenticated party adds the
calculated 16-byte character string to the Data field of the Response
packet and sends the packet to the authenticating party.

CHAP is a three-way handshake authentication protocol. The Request


packet and Response packet exchanged between two communicating
devices during one CHAP process contain the same Identifier.
Unidirectional CHAP authentication is applicable to two scenarios: the
authenticating party is configured with a user name, and the
authenticating party is not configured with a user name. It is
recommended that the authenticating party be configured with a user
name.
When the authenticating party is configured with a user name (that is,
the ppp chap user username command is configured on the interface):

The authenticating party initiates an authentication request


by sending a Challenge packet that carries the local user
name to the authenticated party.

After receiving the Challenge packet on an interface, the


authenticated party checks whether the ppp chap password
command is used on the interface. If this command is used,
the authenticated party uses MD5 to calculate the
concatenation of Identifier, password generated by the ppp
chap password command, and a random number. The
authenticated party then sends a Response packet carrying
the calculated ciphertext password and local user name to
the authenticating party. If the ppp chap password
command is not configured, the authenticated party
searches the local user table for the password matching
the user name of the authenticating party in the received
Challenge packet, and encrypts the matching password by
using MD5 in a similar way. The authenticated party sends
a Response packet carrying the calculated ciphertext
password and local user name to the authenticating party.

The authenticating party encrypts the locally saved


password of the authenticated party by using MD5. The
authenticating party then compares the generated
ciphertext password with that carried in the received
Response packet, and returns a response based on the
check result.
When the authenticating party is not configured with a user name
(that is, the ppp chap user username command is not configured on
the interface):
The authenticating party initiates an authentication
request by sending a Challenge packet.
After receiving the Challenge packet, the
authenticated party uses MD5 to calculate the
concatenation of Identifier, password generated by
the ppp chap password command, and a random
number. It then sends a Response packet carrying
the ciphertext password and local user name to the
authenticating party.
The authenticating party encrypts the locally saved
password of the authenticated party by using MD5.
The authenticating party then compares the
generated ciphertext password with that carried in
the received Response packet, and returns a
response based on the check result.

IPCP negotiates IP addresses of two devices to transmit IP packets


over PPP links.
IPCP and LCP have the same negotiation mechanism, packet type,
and working process.
Topology
Configure two IP addresses 12.1.1.1/24 and 12.1.1.2/24 for the two
ends. (IPCP can be used to negotiate IP addresses even if they are
not on the same network segment.)
The static IP address negotiation process is as follows:

R1 and R2 send a Configure-Request packet carrying the


local IP address to each other.

After receiving the Configure-Request packet from the peer,


R1 and R2 check the IP address in the packet. If the IP
address is a valid unicast IP address, and is different from
the local IP address configured, R1/R2 determines that the
peer can use this address and returns a Configure-Ack
packet.

IPCP uses Configure-Request and Configure-Ack packets


to allow two ends at a PPP link to discover each others 32bit IP address.

As shown in the figure, R1 requests the peer to allocate an IP address


for it and R2 is configured with a static IP address 12.1.1.2/24. R2 is
enabled to allocate an IP address 12.1.1.1 to R1.
The dynamic IP address negotiation process is as follows:
R1 sends a Configure-Request packet carrying the IP address 0.0.0.0
to R2, requesting R2 to allocate an IP address for it.
After receiving the Configure-Request packet, R2 determines that the
IP address 0.0.0.0 is invalid and returns a Configure-Nak packet
carrying a new IP address 12.1.1.1 to R1.
After receiving the Configure-Nak packet, R1 updates the local IP
address, and then sends a Configure-Request packet carrying the new
IP address 12.1.1.1 to R2.
After receiving the Configure-Request packet, R2 determines that the
IP address 12.1.1.1 is valid, and returns a Configure-Ack packet to R1.
In addition, R2 also sends a Configure-Request packet carrying the
IP address 12.1.1.2 to R1. R1 determines that the IP address 12.1.1.2
is valid, and returns a Configure-Ack packet to R2.

Multilink PPP fragments a packet and sends the fragments to the same
destination over multiple PPP links.

PPPoE overview
PPPoE allows a large number of hosts on an Ethernet to
connect to the Internet using a remote access device and
controls each host using PPP. PPPoE features a large
application scale, high security, and convenient accounting.

Topology

A PPPoE session is set up between each PC and the


router on the carrier network. Each PC functions as a
PPPoE client and has a unique account, which facilitates
user accounting and control by the carrier. The PPPoE
client software must be installed on the PCs.

The PPPoE session establishment process includes three stages:


Discovery, Session, and Terminate.
Discovery stage:
A PPPoE client broadcasts a PPPoE Active Discovery
Initial (PADI) packet that contains service information
required by the PPPoE client.
After receiving the PADI packet, all PPPoE servers
compare the requested service with the services they can
provide. The PPPoE servers that can provide the
requested service unicast PPPoE Active Discovery Offer
(PADO) packets to the PPPoE client.
Based on the network topology, the PPPoE client may
receive PADO packets from more than one PPPoE server.
The PPPoE client selects the PPPoE server from which the
first PADO packet is received and unicasts a PPPoE Active
Discovery Request (PADR) packet to the PPPoE server.
The PPPoE server generates a unique session ID to
identify the PPPoE session with the PPPoE client. The
PPPoE server sends a PPPoE Active Discovery Sessionconfirmation (PADS) packet containing this session ID to
the PPPoE client. When the PPPoE session is established,
the PPPoE server and PPPoE client enter the PPPoE
Session stage.
When the PPPoE session is established, the PPPoE server
and PPPoE client share the unique PPPoE session ID and
learns the peer Ethernet address.

Session stage:
PPP negotiation at the PPPoE Session stage is the same
as common PPP negotiation.
When PPP negotiation succeeds, PPP data packets can be
forwarded.
At the PPPoE Session stage, the PPPoE server and client
send all Ethernet data packets in unicast mode.
Terminate stage:
After a PPPoE session is established, the PPPoE client or
the PPPoE server can unicast a PADT packet to terminate
the PPPoE session at any time. When a PADT packet is
received, no further PPP traffic can be sent using this
session.

Four types of FR interfaces are available:


A user's device is called a DTE, and the corresponding
interface type is DTE.
A network device that provides access services for DTE
devices is called a DCE, and the corresponding interface
type is DCE or NNI.
A UNI interface interconnects the DTE and DCE.
An NNI interface interconnects two FR switches.
A Virtual Circuit (VC) is a logical circuit established between two
network devices on the same network.
Based on establishment mode, VCs are classified into two
types:
PVC: refers to the manually created VC.
SVC: refers to the VC that can be created or deleted
automatically through negotiation.
The PVC status of the DTE is determined by the DCE. The
PVC status of the DCE is determined by the network.
VCs are identified by the DLCI and a DLCI takes effect only on a local
interface and its directly connected interface. On an FR network, a
DLCI can identify multiple VCs established on different physical
interfaces.

LMI: local management interface used to monitor the PVC status.


The system supports three LMI protocols: ITU-T Q.933
Annex A, ANSI T1.617 Annex D, and non-standard
compatible protocol. The non-standard compatible protocol
is used for interconnection with a device from a vendor
except Huawei.
The PVC status of the DTE is determined by the DCE. The
PVC status of the DCE is determined by the network.
When two network devices are directly connected, the PVC
status of the DCE is set by the device administrator.
The LMI negotiation process is as follows:
The DTE periodically sends Status Enquiry messages.
After receiving the Status Enquiry message, the DCE
replies a Status message.
The DTE parses the received Status message to obtain the
link status and PVC status.
When the DTE and DCE can normally send and receive
LMI negotiation messages, the link protocol status changes
to Up, and the PVC status changes to Active.
The FR LMI negotiation succeeds.

After the FR LMI negotiation succeeds and the PVC status changes to
Active, two devices on a PVC start the InARP negotiation process:
If a protocol address is configured on the local interface,
the local device (for example, R1) sends an Inverse ARP
Request packet to the peer device (for example, R2) over
the VC. The Inverse ARP Request packet carries the
protocol address of R1.
After receiving the Inverse ARP Request packet, R2
obtains the protocol address of R1, generates an address
mapping, and sends an Inverse ARP Response packet to
R1.
After receiving the Inverse ARP Response packet, R1
parses the address of R2 in the packet and generates an
address mapping.
R1 generates the address mapping 12.1.1.2 to 100, while
R2 generates the address mapping 12.1.1.1 to 100.
If a static mapping is configured manually or a dynamic mapping is
created, the local device does not send an InARP Request packet to
the remote device over the VC regardless of whether the remote
address in the address mapping is correct. The local device sends an
InARP Request packet to the remote device only when no mapping
exists.

Sub-interfaces can solve the problem caused by split horizon on an FR


network. One physical interface can contain multiple logical subinterfaces. Each sub-interface can connect to a remote router over one
or multiple DLCIs. The routers are connected over the FR network.
You can define logical sub-interfaces on the serial line.
Every sub-interface uses one or multiple DLCIs to connect
to the remote router. After a DLCI is configured on a subinterface, the mapping between the destination protocol
address and this DLCI needs to be created.
As shown in the figure, R4 has only one physical serial
interface S0; however, DLCIs are defined on S0 to connect
the sub-interfaces S0.1, S0.2, and S0.3 to R1, R2, and R3
respectively.
Two types of sub-interfaces are available:
P2P sub-interface: used to connect to a single remote
device. Each P2P sub-interface can be configured with only
one PVC. In this case, the remote device can be
determined uniquely without the static address mapping.
Therefore, when the PVC is configured for the subinterface, the peer address is identified.

P2MP sub-interface: used to connect to multiple remote


devices. Each sub-interface can be configured with multiple
PVCs. Each PVC maps the protocol address of its
connected remote device. In this way, different PVCs can
reach different remote devices. You can manually configure
the address mapping, or use InARP to dynamically create
the address mapping.

Case description
The NCP protocol can be used to allocate an IP address to the peer.
You need to configure the ppp chap user Huawei command on R1's
interface to enable R1 to send a Challenge packet to R2 carrying the
user name Huawei.

Command usage
ppp authentication-mode: Configures the PPP authentication mode
in which the local device authenticates the remote device.
ppp chap user: Configures a user name for CHAP authentication.
ppp chap password: Configures a password for CHAP
authentication.
ip address ppp-negotiate: Configures IP address negotiation on an
interface to allow the interface to obtain an IP address from the remote
device.
remote address: Configures the local device to assign an IP address
or specify an IP address pool for the remote device.
Usage scenario
Interface view
Parameters
ppp authentication-mode { chap | pap }
chap: Indicates the CHAP authentication mode.
pap: Indicates the PAP authentication mode.
ppp chap user username
username: Specifies a user name for CHAP authentication.
ppp chap password { cipher | simple } password
cipher: Indicates a ciphertext password.
Simple: Indicates a plaintext password.
Password: Specifies the password for CHAP authentication.
remote address { ip-address | pool pool-name }
cipher: Indicates a ciphertext password.
Simple: Indicates a plaintext password.
Password: Specifies the password for CHAP authentication.

Precautions
In CHAP authentication, the authenticated party does not send the
password to the authenticating party.
The local device can use IPCP to learn the 32-bit host address from
the remote

Command usage
interface mp-group: Creates an MP-Group interface and enters the
MP-Group interface view.
ppp mp mp-group: Binds an interface to the MP-Group interface so
that the interface works in MP mode.
restart: Restarts the current interface.
Precautions
Data frames will be lost after you disable the interface. Exercise
caution when you use the restart command.

Case description
You need to get familiar with the configurations of the PPPoE
server and PPPoE client in this case.

Command usage
virtual-template: Creates a VT interface and enters the VT interface
view.
pppoe-server bind virtual-template: Binds a specified VT interface
to an Ethernet interface and enables PPPoE on the Ethernet interface.
remote address: Configures the local device to assign an IP address
or specifies an IP address pool for the remote device.
dialer-rule: Enters the dialer rule view.
dialer-rule: Specifies a dialer ACL for a dialer access group and
defines conditions to initiate calls.
interface dialer: Creates a dialer interface and enters the dialer
interface view.
dialer user: Enables the resource-shared DCC and specifies the
remote user name of the dialer interface.
dialer-group: Adds an interface to a dialer access group. That is, the
number of the dialer rule is specified.
dialer bundle: Specifies a dialer bundle for a dialer interface in the
resource-shared DCC.
pppoe-client dial-bundle-number: Specifies a dialer bundle for a
PPPoE session.
Parameters
remote address { ip-address | pool pool-name }
ip-address: Specifies an IP address to be allocated to the remote
device.
pool pool-name: Specifies the name of the IP address pool, from which
an IP address is allocated to the remote device.

dialer-rule dialer-rule-number { acl { acl-number | name acl-name }


| ip { deny | permit } | ipv6 { deny | permit } }
dialer-rule-number: Specifies the number of a dialer access group. The
number is the same as the value of group-number in the dialer-group
command.
acl { acl-number |name acl-name }: Indicates the number or name of
the dialer ACL.
ip { deny | permit }: Indicates whether the dialer ACL allows or forbids
IPv4 packets.
Precautions
To configure the local device to allocate an IP address to the remote
device, run the ppp ipcp remote-address forced command in the
interface view.

Case description
In the case of FR network, you do not need to manually
configure the mapping relationship for a P2P sub-interface.

Precautions
You do not need to manually configure the mapping
relationship if the sub-interface is a P2P sub-interface no
matter that has InARP disabled or not.

Topology Description
Broadcast storm
Assume that STP is not enabled on the switching
devices. If PC1 broadcasts a request, the request is
received by port1 and forwarded by port2 on S1 and S2.
On S1 and S2, port 2 receives the request broadcast
by the other switch and port1 forwards the request. As
such transmission repeats and resources on the entire
network are exhausted, causing the network to break
down.
MAC address table flapping
Port2 on S1 can learn the MAC address of the PC2.
Since S2 forwards data frames sent by PC2 to its other
ports, S1 may learn the MAC address of PC2 on port1.
S1 continuously modifies its MAC address table,
causing flapping of the MAC address table.

STP

STP can eliminate network loops. STP is used to build a loopfree network (tree) to ensure the unique data transmission
path and prevent infinite looping of packets. STP works at the
data link layer of the OSI model.
STP-capable switches exchange BPDUs and perform
distributed calculation to determine which ports need to be
blocked to prevent loops.

Root bridge
The root bridge is the bridge with the smallest BID, which is
composed of the priority and MAC address.
Root Port
The root port is the port with the smallest root path to the root
bridge, and is responsible for forwarding data to the root bridge.
The root port is determined based on the path cost. Among all
STP-capable ports on a network bridge, the port with the
smallest root path cost is the root port. There is only one root
port on an STP-capable device, but there is no root port on the
root bridge.
Designated port and bridge
The bridge closest to the root bridge on each network segment
is used as the designated bridge. The port on the designated
bridge to the network segment is called designated port.
The designated port is responsible for forwarding traffic, and
the designated bridge is responsible for forwarding
configuration BPDUs.
After the root bridge, root port, and designated port are selected
successfully, the entire tree topology is set up. When the topology is
stable, only the root port and the designated port forward traffic. All the
other ports are in Blocking state, and receive only STP BPDUs but not
forward user traffic.

A configuration BPDU is generated in one of the three following


scenarios:
When ports are enabled with STP, the designated ports send
configuration BPDUs at intervals specified by the Hello timer.
When a root port receives configuration BPDUs, the device
where the root port resides sends a copy of the configuration
BPDUs to its designated port.
When receiving a configuration BPDU with a lower priority, the
designated port immediately sends its own configuration
BPDUs to the downstream device.
Root identifier
The root identifier is composed of the priority and MAC
address of the root bridge. The default priority is 32768.
Root path cost
Cumulative cost of all links to the root bridge.
Bridge Identifier (BID)
BID of the device sending configuration BPDUs. On a LAN,
the BID is the ID of the designated bridge.
Port Identifier (PID)
PID of the port sending configuration BPDUs. The PID
consists of the port priority and port number. On a LAN, the
PID is the ID of the designated port.

Hello Time
The Hello timer specifies the interval at which an STP-capable
device sends configuration BPDUs to detect link faults.
When the network topology becomes stable, the change of the
interval takes effect only after a new root bridge takes over.
After a topology changes, TCN BPDUs will be sent. This
interval is irrelevant to the transmission of TCN BPDUs.
The default value is 2 seconds.
Max Age
After a non-root bridge running STP receives a configuration
BPDU, the non-root bridge compares the Message Age value
with the Max Age value in the received configuration BPDU.
If the Message Age value is smaller than or equal to
the Max Age value, the non-root bridge forwards the
configuration BPDU.
If the Message Age value is larger than the Max Age
value, the configuration BPDU ages and the non-root
bridge directly discards it. In this case, the network size
is considered too large and the non-root bridge
disconnects from the root bridge.
In real world situations, each time a configuration BPDU
passes through a bridge, the value of Message Age increases
by 1.
The default value is 20.
Forward Delay
The Forward Delay timer specifies the delay for interface
status transition. The default value is 15 seconds.

STP Topology Calculation


After all devices on the network are enabled with STP, each
device considers itself as the root bridge. Each device only
transmits and receives BPDUs but does not forward user
traffic. All ports are in Listening state. After exchanging
configuration BPDUs, all devices participate in the selection of
the root bridge, root port, and designated port.
During network initialization, every device considers itself as
the root bridge and sets the root bridge ID as the device ID.
Devices exchange configuration BPDUs to compare the root
bridge IDs. The device with the smallest BID is elected as the
root bridge.
The switch priority is configurable. The value ranges from 0 to
65535. The default priority is 32768.
Topology Description
Assume that the priorities of S1 and S2 are 0 and 1. Port A on
S1 connects to Port B on S2. S1 sends the configuration
BPDU of {0, 0, 0, Port A} and S2 sends the configuration
BPDU of {1, 0, 1, Port B}. After the two switches compare the
configuration BPDUs, S1 is deemed to have a higher priority
than S2, so S1 becomes the root bridge.

Topology Description
Priorities of S1, S2, and S3 are 0, 1, and 2, and the path costs
between S1 and S2, between S1 and S3, and between S2 and
S3 are 5, 10, and 4 respectively.
Initial configuration BPDUs on ports of S1, S2, and S3:
S1: {0, 0, 0, PortA1} on PortA1 and {0, 0, 0, Port A2} on
Port A2
S2: {1, 0, 1, PortB1} on PortB1 and {1, 0, 1, Port B2} on
Port B2
S3: {2, 0, 2, PortC1} on PortC1 and {21, 0, 2, Port C2}
on Port C2

First exchange of configuration BPDUs


Ports on S1, S2, and S3 send their configuration BPDUs. Each
network bridge considers itself as the root bridge, so the RPC
is 0.

Comparison for the first exchange of configuration BPDUs


S1
Port A1 receives the configuration BPDU {1, 0, 1, Port
B1} from Port B1 and finds that its configuration BPDU
{0, 0, 0, Port A1} has higher priority than the
configuration BPDU {1, 0, 1, Port B1}, so Port A1
discards the configuration BPDU {1, 0, 1, Port B1}.
Port A2 receives the configuration BPDU {2, 0, 2, Port
C1} from Port C1 and finds that its configuration BPDU
{0, 0, 0, Port A2} has higher priority than the
configuration BPDU {2, 0, 2, Port C1}, so Port A2
discards the configuration BPDU {2, 0, 2, Port C1}.
After finding that both the root and the designated
switch IDs refer to itself in the configuration BPDU on
each port, S1 considers itself as the root bridge. S1
then sends configuration BPDUs from each port
periodically without modifying the configuration BPDUs.
The configuration BPDU {0, 0, 0, Port A1} on Port
A1 and configuration BPDU {0, 0, 0, Port A2} on
Port A2 are optimal.
Because S1 is the root bridge, all ports on S1 are
designated ports.

S2

Port B1 receives the configuration BPDU {0, 0, 0, Port


A1} from Port A1 and finds that its configuration BPDU
{0, 0, 0, Port A1} has a higher priority than the
configuration BPDU {1, 0, 1, Port B1}, so Port B1
updates its configuration BPDU.
Port B2 receives the configuration BPDU {2, 0, 2, Port
C2} from Port C2 and finds that its configuration BPDU
{1, 0, 1, Port B2} has a higher priority than the
configuration BPDU {2, 0, 2, Port C2}, so Port B2
discards the configuration BPDU {2, 0, 2, Port C2}.
The configuration BPDU {0, 0, 0, Port A1} on Port
B1 and the configuration BPDU {1, 0, 1, Port B2} on
Port B2 are optimal.
Comparison of configuration BPDUs on ports:
S2 compares the configuration BPDU on each
port and finds that the configuration BPDU on
Port B1 has the highest priority, so Port B1 is
used as the root port and the configuration
BPDU on Port B1 remains unchanged.
S2 calculates the BPDU {0, 5, 1, Port B2} for
Port B2 based on the configuration BPDU and
path cost of the root port, and compares the
configuration BPDU {0, 5, 1, Port B2} with its
configuration BPDU {1, 0, 1, Port B2} on Port
B2. S2 finds that the calculated configuration
BPDU has a higher priority, so Port B2 is used
as the designated port, and its configuration
BPDU is replaced by the calculated
configuration BPDU and the calculated
configuration BPDU is sent periodically.

S3

Port C1 receives the configuration BPDU {0, 0, 0, Port


A2} from Port A2 and finds that the configuration BPDU
{0, 0, 0, Port A2} has a higher priority than its
configuration BPDU {2, 0, 2, Port C1}, so Port C1
updates its configuration BPDU.
Port C2 receives the configuration BPDU {1, 0, 1, Port
B2} from Port B2 and finds that the configuration BPDU
{1, 0, 1, Port B2} has a higher priority than its
configuration BPDU {2, 0, 2, Port C2}, so Port C2
updates its configuration BPDU.

The configuration BPDU {0, 0, 0, Port A2} on Port


C1 and configuration BPDU {1, 0, 1, Port B2} on
Port C2 are optimal.
Comparison of configuration BPDUs on ports:
S3 compares the configuration BPDU on each
port and finds that the configuration BPDU on
Port C1 has the highest priority, so Port C1 is
used as the root port and the configuration
BPDU on Port C1 remains unchanged.
S3 calculates the configuration BPDU {0, 10, 2,
Port C2} for Port C2 based on the configuration
BPDU and path cost of the root port, and
compares the configuration BPDU {0, 10, 2,
Port C2} with its configuration BPDU {1, 0, 1,
Port B2} on Port C2. S3 finds that the calculated
configuration BPDU has a higher priority, so
Port C2 is used as the designated port and its
configuration BPDU is replaced by the
calculated configuration BPDU.

Second exchange of configuration BPDUs


S1 is the root bridge. Configuration BPDUs sent by S1
The configuration BPDU sent by Port A1 is {0, 0, 0,
Port A1}.
The configuration BPDU sent by Port A2 is {0, 0, 0,
Port A2}.
Configuration BPDUs sent by S2
S1 is the root bridge, so S2 does not send
configuration BPDUs to S1.
The configuration BPDU sent by Port B2 is {0, 5, 1,
Port B2}.
Configuration BPDUs sent by S3
S1 is the root bridge, so S3 does not send
configuration BPDUs to S1.
The configuration BPDU sent by Port C2 is {0, 10, 2,
Port C2}.

Comparison for the second exchange of configuration BPDUs


S2
Port B1 receives the configuration BPDU {0, 0, 0, Port
A1} from Port A1 and finds that the received
configuration BPDU is the same as its own
configuration BPDU, so Port B1 discards the received
one.
Port B2 receives the configuration BPDU {0, 10, 2, Port
C2} from Port C2 and finds that its configuration BPDU
{0, 5, 1, Port B2} has a higher priority, so Port B2
discards it.
After comparison, the optimal configuration BPDUs
on Port B1 and Port B2 are {0, 0, 0, Port A1} and {0,
5, 1, Port B2} respectively.
Because the optimal configuration BPDU on each port
remains unchanged, the port role does not change.
S3
Port C1 receives the configuration BPDU {0, 0, 0, Port
A2} from S1 and finds that the received configuration
BPDU is the same as its own configuration BPDU, so
Port C1 discards the received one.
Port C2 receives the configuration BPDU {0, 5, 1, Port
B2} from S1 and compares it with its configuration
BPDU {0, 10, 2, Port C2}.

Because the root bridge ID is the same, the root path


costs are compared. Port C2 finds that the received
configuration BPDU has a higher priority(10>9), so Port
C2 updates its BPDU as {0, 5, 1, Port B2}.
After comparison, the optimal configuration BPDUs
on Port C1 and Port C2 are {0, 0, 0, Port A2} and {0,
5, 1, Port B2} respectively.
Comparison of configuration BPDUs on each port:
S3 compares the root path cost of Port C1 (root
path cost of 0 in the received configuration
BPDU + path cost 10 of the link) with the root
path cost of Port C2 (root path cost of 5 in the
received configuration BPDU + path cost 4 of
the link). The root path cost of Port C2 is
smaller, so the configuration BPDU of Port C2
is preferred. Port C2 is used as the root port
and its configuration BPDU remains unchanged.
S3 calculates the configuration BPDU {0, 9, 2,
Port C1} for Port C1 according to the
configuration BPDU and path cost of the root
port, and compares the calculated configuration
BPDU with its configuration BPDU. S3 finds
that its configuration BPDU has a higher priority,
so Port C1 is blocked and the configuration
BPDU of S3 remains unchanged. In this case,
Port C1 does not forward data. Furthermore,
spanning tree calculation may be triggered, for
example, the link between S2 and S3 becomes
Down.

Topology on the Left Side


According to the root bridge selection principle of STP, S1 is
the root bridge. Then determine the root port, designated port,
and alternate port.
E0 and E1 on S2 receive BPDUs {0, 0, 0, E0} and {0, 0, 0, E1}
from S1. In the two BPDUs, only the transmit port is different.
The port with smaller PID has a higher priority, so E0 is the
root port and E1 is the alternate port.
Topology on the Right Side
According to the root bridge selection principle of STP, S1 is
the root bridge. Then determine the root port, designated port,
and alternate port.
E0 and E1 on S2 receive BPDUs {0, 0, 0, E0} and {0, 0, 0, E1}
from S1. The two BPDUs have the same priority, only the PIDs
are compared. E0 has smaller PID, so E0 is the root port and
E1 is the alternate port.

Generally, only the root bridge generates and sends configuration


BPDUs. Other non-root-bridges only forward the configuration BPDU
from the root port using their designated ports. The designated port on
a non-root-bridge sends the optimal BPDU only after receiving BPDUs
with a lower priority.

Topology description:
After S2 receives a BPDU with a lower priority from S4, S2
sends a configuration BPDU. This is because network bridges
save the optimal configuration BPDU.

Topology Description
The figure on the left side shows the initial topology. The path
costs are the same. S1, S2, and S3 are connected, S1 is the
root port, and interconnected ports are in forwarding state. In
the figure on the right side, a link between S1 and S2 is added.
After S2 receives BPDUs from S1 and S3, S2 considers that
the port connected to S1 is the new root port and the port
connected to S3 is the designated port. All ports are root ports
or designated ports in forwarding state. In this case, a loop
occurs. The loop can be eliminated only when configuration
BPDUs are transmitted to each network bridge and S2 blocks
the port connected to S3 through calculation.
There is a delay for a port (for example, port E on S2) to
change from non-forwarding to forwarding so that ports that
want to enter the non-forwarding state can complete spanning
tree calculation.

Forward Delay
The default interval for port status transition is 15 seconds.
There are specific calculation between Forwarding Delay, hello
timer and Max Age, the default value is based on the diameter
7 calculating.
Port Status Description
After a port is enabled, the port enters the Listening state and
starts the spanning tree calculation.
If the port needs to be configured as the alternate port through
calculation, the port enters the Blocking state.
If the port needs to be configured as the root port or
designated port through calculation, the port enters the
Learning state from the Listening state after a Forward Delay
period. The port then enters the Forwarding state from the
Learning state after the Forward Delay period. The port in
Forwarding state can forward data frames.

Huawei switch port status


Huawei datacom devices use MSTP by default. After a device
transitions from the MSTP mode to the STP mode, its STPcapable port supports the same port states as those supported
by an MSTP-capable port, including the Forwarding, Learning,
and Discarding states.

Port status transition


The port is initialized or enabled.
The port is blocked or the link fails.
The port is selected as the root port or designated port.
The port is no longer the root port or designated port.
The Forward Delay timer expires.

TCN BPDU processing:


After the network topology changes, the downstream device
continuously sends a TCN BPDU to its upstream device which
the port status turn to forwarding.
After the upstream device receives the TCN BPDU from the
downstream device, only the designated port processes it. The
other ports may receive the TCN BPDU but do not process it.
The upstream device sets the TCA bit of the Flags field in the
configuration BPDU to 1 and returns the configuration BPDU
to instruct the downstream device to stop sending TCN
BPDUs.
The upstream device sends a copy of the TCN BPDU to the
root bridge.
Steps 1 to 4 repeat until the root bridge receives the TCN
BPDU.
After receiving the TCN BPDU, the root bridge resets the TCA
bit in the subsequent configuration BPDU for acknowledgment
and sets the TC bit of the Flags field in the configuration BPDU
to 1 to notify all network bridges of the topology change.
After the periods of Max Age and Forward Delay, the root
bridge sends the BPDU with the reset TC bit. The network
bridge that receives the BPDU reduces the aging time of MAC
address entries to the Forward Delay period.

Topology Description:
Through STP calculation, S1 is the root bridge and port E1 on
S4 is blocked.
When the link of port E1 on S3 fails, the STP will be
calculation again, port E1 of S4 will turn to designated port and
the status is forwarding, S4 immediately sends a TCN BPDU
to the upstream.
After S2 receives the TCN BPDU from S3, S2 resets the TCA
bit in the subsequent configuration BPDU and sends it to S4
from port E3. S2 also sends the TCN BPDU to the root from
the root port E1.
After S1 receives the TCN BPDU from S2, S1 resets the TCA
and TC bits in the subsequent configuration BPDU and sends
it to S2 from the designated port E1. Within the period of 35
seconds (20 seconds + 15 seconds), S1 resets the TC bit in
the configuration BPDU. After receiving the configuration
BPDU with the reset TC bit, each network bridge changes its
aging time of MAC address entries to 15 seconds.
When the topology change, the MAC address table will
established soon, which can avoid wasting of bandwidth.

Root bridge failure:


When S1 becomes faulty, S2 and S3 cannot receive BPDUs
from the root bridge. S2 and S3 detect the root bridge failure
only after a Max Age period. S2 and S3 then determine the
new root bridge, root port, and designated port. The topology
convergence period is 50 seconds (BPDU aging period plus
value twice the Forward Delay period).
Link failure:
When the link between S3 and S1 fails, S3 can immediately
detect this event. The blocked port on S3 immediately enters
the Listening state and sends the configuration BPDU with
itself as the root. After S2 receives the BPDU with lower
priority from S3, S2 sends a configuration BPDU with S1 as
the root. The port on S2 connected to S3 therefore becomes
the root port, and the port on S3 connected to S2 becomes the
designated port. The period for the S3 port status change from
Listening, Learning, to Forwarding is 30 seconds.
When a link fails or is added, the fault can be rectified after 30 seconds.

STP Limitation:
Port statuses or port roles are not distinguished in a finegranular manner. For example, ports in Listening and Blocking
states do not forward user traffic or learn MAC addresses.
The STP algorithm determines topology changes after the time
set by the timer expires, which slows down network
convergence.
The STP algorithm requires a stable network topology. After
the root bridge sends configuration BPDUs, other devices
process the configuration BPDUs so that the configuration
BPDUs are advertised to the entire network.

RSTP has all functions of STP, and the RSTP-capable and STPcapable network bridges can work together.

RSTP defines four port roles: root port, designated port, alternate port,
and backup port.
The functions of the root port and designated port are the same as
those defined in STP. The alternate port and backup port are described
as follows.
From the perspective of configuration BPDU transmission:
An alternate port is blocked after learning the
configuration BPDUs with a higher priority from other
bridges.
A backup port is blocked after learning the
configuration BPDUs with a higher priority than itself.
From the perspective of user traffic:
An alternate port backs up the root port and provides
an alternate path from the designated bridge to the root
bridge.
A backup port backs up the designated port and
provides an alternate path from the root bridge to a
network segment.
After all RSTP-capable ports are assigned roles, topology convergence
is completed.

Port statuses are simplified from five types to three types. Based on
whether a port forwards user traffic and learns MAC addresses, the port
is in one of the following states:
If a port neither forwards user traffic nor learns MAC
addresses, the port is in Discarding state.
If a port does not forward user traffic but learns MAC
addresses, the port is in Learning state.
If a port forwards user traffic and learns MAC addresses, the
port is in Forwarding state.
RSTP Calculation
Roles of ports in Discarding state are determined:
The root port and designated port enter the learning
state after the Forward Delay period. A port in Learning
state learns MAC addresses and enters the Forwarding
state after a Forward Delay period. RSTP accelerates
this process using another mechanism.
An alternate port maintains a Discarding state.

Configuration BPDUs in RSTP are differently defined. Port roles are


described based on the Flags field defined in STP. When compared
with STP, RSTP slightly redefines the format of configuration BPDUs.
The value of the Type field is no longer set to 0 but 2. The
STP-capable device therefore always discards the
configuration BPDUs sent by an RSTP-capable device.
The 6 bits in the middle of the original Flags field are reserved.
Such a configuration BPDU is called an RST BPDU.
Flags field in an RST BPDU:
Bit 0 indicates the TC bit, which is the same as that in STP.
Bit 1 indicates the Proposal flag bit, indicating that the BPDU is
the Proposal packet in the fast convergence mechanism.
Bit 2 and bit 3 indicate the port role. The value 00 indicates the
unknown port; the value 01 indicates the root port; the value
10 indicates the alternate or backup port; the value 11
indicates the designated port.
Bit 4 indicates that the port is in Learning state.
Bit 5 indicates that the port is in Forwarding state.
Bit 6 indicates the Agreement packet in the fast convergence
mechanism.
Bit 7 indicates the TCA bit, which is the same as that in STP.

Configuration BPDUs are processed in a different manner.


Transmission of configuration BPDUs after the topology
becomes stable
In STP, after the topology becomes stable, the root
bridge sends configuration BPDUs at an interval set by
the Hello timer. A non-root-bridge does not send
configuration BPDUs until it receives configuration
BPDUs sent from the upstream device. This renders
the STP calculation complicated and time-consuming.
In RSTP, after the topology becomes stable, a nonroot-bridge sends configuration BPDUs at an interval
set by the Hello timer, regardless of whether it has
received the configuration BPDUs sent from the root
bridge. Such operations are implemented on each
device independently.
Shorter timeout interval of BPDUs
In STP, a device has to wait for the Max Age period
before determining a negotiation failure. In RSTP, if a
port does not receive configuration BPDUs sent from
the upstream device for three consecutive intervals set
by the Hello timer, the negotiation between the local
device and its peer fails.

Processing of RST BPDUs with lower priority


In RSTP, when a port receives an RST BPDU from the
upstream designated bridge, the port compares the
received RST BPDU with its own RST BPDU. If its own
RST BPDU has higher priority than the received one,
the port discards the received RST BPDU and
immediately responds to the upstream device with its
own RST BPDU. After receiving the RST BPDU, the
upstream device updates its own RST BPDU based on
the corresponding fields in the received RST BPDU. In
this manner, RSTP processes BPDUs with lower
priority more rapidly, independent of any timer that is
used in STP.

STP convergence
To eliminate loops, STP uses timers to complete convergence.
The default period from the time the port is enabled to the time
the port is in Forwarding state is 30 seconds. Shortening the
values of timers may cause the network to become unstable.
RSTP fast convergence
Edge port
In RSTP, a designated port on the network edge is
called an edge port. An edge port directly connects to a
terminal and does not connect to any other switching
devices. An edge port does not receive configuration
BPDUs, so it does not participate in the RSTP
calculation. It can directly change from the Disabled
state to the Forwarding state without any delay, just like
an STP-incapable port. If an edge port receives bogus
configuration BPDUs from attackers, it becomes a
common STP port. The STP recalculation is performed,
causing network flapping.
Fast switching of the root port
If the root port fails, the optimal alternate port on the
network becomes the root port and enters the
Forwarding state. This is because there must be a path
from the root bridge to a designated port on the
network segment connecting to the alternate port.

Proposal/Agreement mechanism
When a port is selected as a designated port, in STP,
the port does not enter the Forwarding state until a
Forward Delay period expires; in RSTP, the port enters
the Discarding state, and then the Proposal/Agreement
mechanism allows the port to immediately enter the
Forwarding state. The Proposal/Agreement mechanism
must be applied on the P2P links in full-duplex mode.
The P/A mechanism is short for the
Proposal/Agreement mechanism

Edge port
An edge port directly connects to a terminal. When the network
topology changes, loops do not occur on the edge port. The
edge port therefore can directly enter the Forwarding state
without waiting for two Forward Delay periods.
An edge port does not receive configuration BPDUs, so it does
not participate in the RSTP calculation. It can directly change
from the Disabled state to the Forwarding state without any
delay, just like an STP-incapable port. If an edge port receives
bogus configuration BPDUs from attackers, it becomes a
common STP port. The STP recalculation is performed,
causing network flapping.

Fast switching of the root port


In RSTP, an alternate port is the backup of the root port. When
the root port of a network bridge becomes discarding, the
optimal alternate port is used as the new root port and
becomes Forwarding states. Because the network segment
connects to this alternate port must have a designated port
whitch can reach to the root bridge.

P/A mechanism
The Proposal/Agreement (P/A) mechanism enables a
designated port to rapidly enter the Forwarding state.
The P/A mechanism requires that the link between two
switching devices should be P2P and work in full-duplex mode.
When P/A negotiation fails, the designated port is selected
after two Forward Delay periods. The negotiation process is
the same as that in STP.
After a new link is established, the negotiation process of the
P/A mechanism is as follows:
p0 and p1 become designated ports and send RST
BPDUs.
After receiving an RST BPDU with higher priority, p1
on S2 determines that it will become a root port but not
a designated port. p1 then stops sending RST BPDUs.
p0 on S1 enters the Discarding state and sends RST
BPDUs with the Proposal field of 1.
After receiving an RST BPDU with the Proposal field of
1, S2 sets the sync variable to 1 for all its ports.
As p2 has been blocked, its status remains unchanged;
p4 is an edge port and does not participate in
calculation. Only the non-edge designated port p3
therefore needs to be blocked.

After p2, p3, and p4 enter the Discarding state, their


synced variables are set to 1. The synced variable of
the root port p1 is then set to 1, and p1 sends an RST
BPDU with the Agreement field of 1 to S1. With
exception of the Agreement field that is set to 1 and the
Proposal field that is set to 0, the RST BPDU is the
same as that received.
After receiving this RST BPDU, S1 identifies the RST
BPDU as a response to the Proposal packet that it just
sent, and p0 immediately enters the Forwarding state.

The P/A negotiation with the downstream device as follows.


When a link between S1 and S2 is added, the P/A mechanism works
as follows:
S1 sends an RST BPDU with the Proposal field of 1 to S2.
After receiving the RST BPDU, S2 determines that E2 is the
root port. S2 blocks designated ports of E1 and E3, sets the
root port to the Forwarding state, and sends an Agreement
packet to S1.
After S1 receives the Agreement packet, its designated port
E1 immediately enters the Forwarding state.
The non-edge designated ports of E1 and E3 on S2 sends
Proposal packets.
After S3 receives the Proposal packets from S2, S3
determines that E1 is the root port and starts synchronization.
Because the downstream port of S3 is the edge port, S3
directly sends an Agreement packet.
After S2 receives the Agreement packet from S3, its port E1
immediately enters the Forwarding state.
The process on S4 is similar to that on S3.
After S2 receives the Agreement packet from S4, its port E3
immediately enters the Forwarding state.
The P/A process is completed.

In RSTP, if a non-edge port changes to the Forwarding state, the


topology changes.
After a switching device detects the topology change (TC), it performs
the following operations:
Start a TC While timer for every non-edge port. The TC While
Timer value doubles the Hello timer value. All MAC address
entries learned by the ports whose status changes are cleared
before the timer expires. These ports send RST BPDUs with
the TC field of 1. Once the TC While timer expires, the ports
stop sending the RST BPDUs.
After another switching device receives the RST BPDU, it
clears the MAC addresses learned by all ports excluding the
one that receives the RST BPDU. The switching device then
starts a TC While timer for all non-edge ports and the root port.
The process is similar.
In this manner, RST BPDUs flood the network.

When a port switches from RSTP to STP, the port loses RSTP features
such as fast convergence.
On a network where both STP-capable and RSTP-capable devices are
deployed, STP-capable devices ignore RST BPDUs; if a port on an
RSTP-capable device receives a configuration BPDU from an STPcapable device, the port switches to the STP mode after two intervals
specified by the Hello timer and starts to send configuration BPDUs. In
this manner, RSTP and STP are interoperable.
After STP-capable devices are removed, Huawei RSTP-capable
datacom devices can switch back to the RSTP mode.

RSTP, an enhancement to STP, implements fast convergence of the


network topology. There is a defect for both RSTP and STP: All VLANs
on a LAN use one spanning tree, and VLAN-based load balancing
cannot be performed. Once a link is blocked, it will no longer transmit
traffic, wasting bandwidth and causing the failure in forwarding certain
VLAN packets.
Topology Description
STP or RSTP is deployed on the LAN. The broken line shows
the spanning tree; S6 is the root switching device; the links
between S1 and S4 and between S2 and S5 are blocked.
VLAN packets are transmitted by using only the links marked
with "VLAN2" or "VLAN3." PC2 and PC3 belong to VLAN 2 but
they cannot communicate with each other because the link
between S2 and S5 is blocked and the link between S3 and
S6 rejects packets from VLAN 2.
MSTP can be used to address this issue. MSTP implements fast
convergence and provides multiple paths to load balance VLAN traffic.

A Multiple Spanning Tree (MST) region contains multiple switching


devices and network segments between them. The switching devices
of one MST region have the following identical characteristics:
MSTP-enabled
Region name
VLAN-MSTI mappings
MSTP revision level
An instance is a collection of VLANs. Binding multiple VLANs to an
instance saves communication costs and reduces resource usage. The
topology of each MSTI is calculated independent of one another, and
traffic can be balanced among MSTIs. Multiple VLANs that have the
same topology can be mapped to one instance. The forwarding status
of the VLANs for a port is determined by the port status in the MSTI.

The Common and Internal Spanning Tree (CIST), calculated using STP
or RSTP, connects all switching devices on a switching network.
The CIST root is the network bridge with the highest priority on
the entire network, that is, root bridge of the CIST.
In the preceding topology, the lines in red in MSTIs and the
lines in blue between MSTIs form a CIST. The root bridge of
the CIST is S1 in MST region 1.
A Common Spanning Tree (CST) connects all the MST regions on a
switching network.
The CST is calculated by all nodes using STP or RSTP.
In the preceding topology, the lines in blue form a CST. The
CST root is MST region 1.
An Internal Spanning Tree (IST) resides within an MST region.
Each spanning tree in an MST region has an MSTI ID. An IST
is a special MSTI with the MSTI ID of 0, called MSTI 0. The
VLANs that do not map to other MSTIs map to MSTI 0.
An IST is a segment of the CIST in an MST region.
In the preceding topology, the lines in red form a IST.
The master bridge is the IST master, which is the switching device
closest to the CIST root in a region.
If the CIST root is in an MST region, the CIST root is the
master bridge of the region.

In the preceding topology, S1, S4, and S7 are master bridges.


A Single Spanning Tree (SST) is formed in either of the following
situations:
A switching device running STP or RSTP belongs to only one
spanning tree.
An MST region has only one switching device.
There is no SST in the preceding topology.

MSTI
An MST region can contain multiple spanning trees, each
called an MSTI. An MSTI regional root is the root of the MSTI.
Each MSTI has its own regional root.
MSTIs are independent of each other. An MSTI can map to
one or more VLANs, but one VLAN can map to only one MSTI.
Each MSTI has an MSTI ID. The MSTI ID starts from 1, which
is distinguished with the IST (MSTI 0).
In the preceding topology, VLAN 2 maps to MSTI 2 and VLAN
4 to MSTI 4.
MSTI regional root
The MSTI regional root is the network bridge with the highest
priority in each MSTI. You can specify different roots in
different MSTIs.
In the preceding topology, assuming that S9 has the highest
priority in MSTI 2, S9 is the regional root in MSTI 2. Assuming
that S8 has the highest priority in MSTI 4, S8 is the regional
root in MSTI 2.

When compared to RSTP, MSTP has two additional port types. MSTP
ports include the root port, designated port, alternate port, backup port,
edge port, master port, and regional edge port.
Master port
A master port is on the shortest path connecting MST
regions to the CIST root.
BPDUs of an MST region are sent to the CIST root
through the master port.
Master ports are special regional edge ports,
functioning as root ports in the CIST and master ports
in instances.
In the preceding topology, the port on S7 connected to
MST region 1 is the master port.
Regional edge port
A port connecting the network bridge in an MST region
to another MST region or an STP or RSTP-enabled
network bridge is a regional edge.
In the preceding topology, the port on S8 connected to
MST region 2 is the regional edge port.

Network bridges may have different roles in different MSTIs, so ports


with exception to the master port on network bridges may have
different roles. The master port retains its role in all MSTIs.

Currently, there are two MST BPDU formats:


dot1s: BPDU format defined in IEEE 802.1s
legacy: private BPDU format
In using the stp compliance command, you can configure a port
on a Huawei datacom device to automatically adjust the MST
BPDU format.
With exception to MSTP-specific fields, other fields in an intra-region or
inter-region MST BPDU are the same as those in an RST BPDU.
The Root ID field in an RST BPDU indicates the CIST root ID
in an MST BPDU.
The EPC field in an MST BPDU indicates the total path cost
from the MST region where the network bridge sending the
BPDU resides to the MST region where the CIST root resides.
The Bridge ID field in an MST BPDU indicates the regional
root ID in the CIST.
The Port ID field in an MST BPDU indicates the ID of the
designated port in the CIST.
MSTP-specific fields:
Version 3 Length: indicates the BPDUv3 length, which
is used to check received MST BPDUs.
MST Configuration Identifier: indicates the MST
configuration identifier, which has four fields.

This field identifies an MST region where a network


bridge is located. Neighboring switches are in the same
MST region only when the following fields on the
switches are the same:
Format Selector: indicates the 802.1s-defined
protocol selector. It has a fixed value of 0.
Name: indicates the configuration name, that is,
the MST region name of a switch. The value
has 32 bytes. Each switch has an MST region
name configured. The default value is the
switchs MAC address.
Config Digest: indicates the configuration digest,
which has 16 bytes. Switches in an MST region
should maintain the same mapping between
VLANs and MSTIs. However, the MST
configuration table is too large (8192 bytes) and
cannot be easily transmitted between switches.
This field is the digest calculated from the MST
configuration table using the MD5 algorithm.
Revision Level: indicates the revision level of an
MST region, which has two bytes. The default
value is all 0s. The value of the Config Digest
field is the digest of the MST configuration table,
there is a low probability that MST configuration
tables are different but the digest is the same.
In this case, switches in different MST regions
may be incorrectly considered in the same MST
region. It is recommended that different MST
regions use different revision levels to prevent
the preceding problem.
CIST Internal Root Path Cost: indicates the total path
cost from the local port to the IST master. This value is
calculated based on link bandwidth.
CIST Bridge Identifier: indicates the ID of the
designated switching device on the CIST.
CIST Remaining Hops: indicates the remaining hops of
a BPDU in the CIST. This field is used to limit the MST
scale. A BPDU has the maximum hop count on the
CIST regional root. The hop count decreases by 1
every time the BPDU passes a network bridge. The
network bridge discards the BPDU with the hop of 0.
MSTI Configuration Messages(may be absent):
indicates an MSTI configuration message.
MSTI Flag: has eight bits. Bits 1 to 7 are the
same as those in RSTP. Bit 8 indicates whether
the network bridge is the master bridge, and
replaces the TCA bit in RSTP.

MSTI region Root ID: indicates the regional root


ID of the MSTI.
MSTI IRPC: indicates the path cost from the
network bridge sending the BPDU to the MSTI
regional root.
MSTI Bridge Priority: indicates the priority of the
network bridge that sends the BPDU.
MSTI Port Priority: indicates the priority of the
port that sends the BPDU.
MSTI Remaining Hops: indicates the remaining
number of hops in an MSTI.

MSTP Topology Calculation


In MSTP, the entire Layer 2 network is divided into multiple
MST regions, which are interconnected by a single CST. In an
MST region, multiple spanning trees are calculated, each of
which is called an MSTI. Among these MSTIs, MSTI 0 is also
known as the internal spanning tree (IST). Like STP, MSTP
uses configuration BPDUs to calculate spanning trees, but the
configuration BPDUs are MSTP-specific.
Vectors
Root switching device ID: identifies the root switching device in
the CIST. The root switching device ID consists of the priority
value (16 bits) and MAC address (48 bits). The priority value is
the priority of MSTI 0.
External root path cost (ERPC): indicates the external root
path cost from the CIST regional root to the CIST root. ERPCs
saved on all switching devices in an MST region are the same.
If the CIST root is in an MST region, ERPCs saved on all
switching devices in the MST region are 0s.
Regional root ID: identifies the MSTI regional root. The
regional root ID consists of the priority value (16 bits) and MAC
address (48 bits).
Internal root path cost (IRPC): indicates the path cost from the
local bridge to the regional root.

Designated switching device ID: indicates the network bridge


that sends the BPDU.
Designated port ID: identifies the port on the designated
switching device connected to the root port on the local device.
The port ID consists of the priority value (4 bits) and port
number (12 bits). The priority value must be a multiple of 16.
Receiving port ID: identifies the port that receives the BPDU.
The port ID consists of the priority value (4 bits) and port
number (12 bits). The priority value must be a multiple of 16.
If the priority of a vector carried in the configuration message of a
BPDU received by a port is higher than the priority of the vector in the
configuration message saved on the port, the port replaces the saved
configuration message with the received one. In addition, the port
updates the global configuration message saved on the device. If the
priority of a vector carried in the configuration message of a BPDU
received on a port is equal to or lower than the priority of the vector in
the configuration message saved on the port, the port discards the
BPDU.

CST Calculation
CST and IST calculation is similar to the calculation in RSTP.
During CST calculation, an MST region is considered as a
network bridge and the ID of the network bridge is the IST
regional root ID.
CIST uses the following vectors: {root switching device ID,
ERPC, regional root ID, IRPC, designated switching device ID,
designated port ID, receiving port ID}. CST uses the following
vectors: {CIST root, ERPC, regional root ID, designated port ID,
receiving port ID}.
Topology description:
Assume that S1, S4, and S7 are regional roots in
Region1, Region2, and Region3 respectively. S1 has
the highest priority, S4 has the lowest priority, and the
cost of each path is the same.
Each MST region is considered as a network bridge,
and the ID of the network bridge is the regional root ID.
Each MST region sends a BPDU with itself as the CIST
root and external cost of 0 to other MST regions.
Through RSTP calculation, S1 is the CIST root.
Through ERPC comparison, the port of each regional
root connected to Region1 is the master port.
Through comparison of priorities in regional root IDs,
the regional edge port is determined.

IST Calculation
CST and IST calculation is similar to the calculation in RSTP.
MSTP calculates an IST for each MST region, and computes a
CST to interconnect MST regions. The CST and ISTs
constitute a CIST for the entire network.
CIST uses the following vectors: {root switching device ID,
ERPC, regional root ID, IRPC, designated switching device ID,
designated port ID, receiving port ID}. IST uses the following
vectors: {CIST root, IRPC, designated bridge ID, designated
port ID, receiving port ID}.
Topology description:
After CST calculation is complete, S1, S4, and S7 are
regional roots in Region1, Region2, and Region3
respectively. In this situation, the regional root is the
network bridge closest to the CIST root but not the
network bridge with the highest priority.
The role of a port on each network bridge is determined
based on the regional root as the root bridge and IRPC,
and then the IST is obtained.
Network bridges in an MST region compare IRPCs to
determine the IST root port.
Port roles in the IST are determined based on priorities
in BPDUs.

Region1 Calculation
In an MST region, MSTP calculates an MSTI for each VLAN
based on mappings between VLANs and MSTIs. Each MSTI is
calculated independently. The calculation process is similar to
the process for STP to calculate a spanning tree.
Topology description:
In Region1, VLAN 2 maps to MSTI 2, VLAN 4 to MSTI
4, and other VLANs to MSTI 0.
Different priorities are specified for network bridges in
different MSTIs. Assume that S2 is the root bridge in
MSTI 2 and S3 is the root bridge in MSTI 4.
In MSTI 2, S2, S1, and S3 are in descending order of
priority. Through calculation, the port on S3 connected
to S1 is blocked.
In MSTI 4, S3, S1, and S2 are in descending order of
priority. Through calculation, the port on S2 connected
to S1 is blocked.
MSTIs have the following characteristics:
The spanning tree is calculated independently for each MSTI,
and spanning trees of MSTIs are independent of each other.
MSTP calculates the spanning tree for an MSTI in a manner
similar to STP.
Spanning trees of MSTIs can have different roots and
topologies.

Each MSTI sends BPDUs in its spanning tree.


The topology of each MSTI is configured by using commands.
A port can be configured with different parameters for different
MSTIs.
A port can play different roles or have different statuses in
different MSTIs.

Region2 Calculation
Topology description:
In Region2, VLAN 2 maps to MSTI 2, VLAN 3 to MSTI
3, and other VLANs to MSTI 0.
Different priorities are specified for network bridges in
different MSTIs. Assume that S5 is the root bridge in
MSTI 2 and S6 is the root bridge in MSTI 3.
In MSTI 2, S5, S4, and S6 are in descending order of
priority. Through calculation, the port on S6 connected
to S4 is blocked.
In MSTI 3, S6, S4, and S5 are in descending order of
priority. Through calculation, the port on S5 connected
to S4 is blocked.

Region3 Calculation
Topology description:
In Region3, VLAN 2 maps to MSTI 2, VLAN 4 to MSTI
4, and other VLANs to MSTI 0.
Different priorities are specified for network bridges in
different MSTIs. Assume that S9 is the root bridge in
MSTI 2 and S8 is the root bridge in MSTI 4.
In MSTI 2, S9, S10, S8, and S7 are in descending
order of priority. Through calculation, the port on S7
connected to S8 and the port on S8 connected to S10
are blocked.
In MSTI 4, S8, S7, S10, and S9 are in descending
order of priority. Through calculation, the port on S9
connected to S7 and the port on S10 connected to S7
are blocked.

MSTI Calculation
After CIST and MSTI calculations are complete, the mapping
between VLANs and MSTIs in each MST region is
independent.
On an MSTP-aware network, a VLAN packet is forwarded
along the following paths:
MSTI including the IST in an MST region
CST among MST regions

Interoperability between MSTP and RSTP


An RSTP or STP-enabled network bridge considers an MST
region as the RSTP-enabled bridge with the bridge ID as the
regional root ID.
When an RSTP or STP-enabled network bridge receives an
MST BPDU, it obtains the CIST root, ERPC, regional root ID,
and designated port ID in the MST BPDU as the RID, RPC,
BID, and PID.
When an MSTP-enabled network bridge receives an STP or
RST BPDU, it obtains the RID, RPC, BID, and PID as the
CIST root, ERPC, regional root ID, and designated port ID.
The BID is used as the regional root ID and designated switch
ID, and the IRPC is 0.

In MSTP, the P/A mechanism works as follows:


The upstream device sends a Proposal packet to the
downstream device, requesting fast switching. After receiving
the Proposal packet, the downstream device sets its port
connecting to the upstream device to the root port and blocks
all non-edge ports.
The upstream device continues to send an Agreement packet.
After receiving the Agreement packet, the root port enters the
Forwarding state.
The downstream device replies with an Agreement packet.
After receiving the Agreement packet, the upstream device
sets its port connecting to the downstream device to the
designated port, and the port enters the Forwarding state.
By default, Huawei datacom devices use the enhanced P/A mechanism.
To enable a Huawei datacom device to communicate with third-party
devices that use the ordinary P/A mechanism, run the stp noagreement-check command to configure the ordinary P/A mechanism
on the Huawei datacom device.

Case description
S1, S2, and S3 must be in descending order of priority to meet
requirements 2 and 3.

Command usage
The stp mode command sets the working mode of a spanning
tree protocol on a switching device.
The stp root command configures a switching device as the
root bridge or secondary root bridge of a spanning tree.
The stp priority command sets the priority of the switching
device in a spanning tree.
The stp cost command sets the path cost of a port in a
spanning tree.
Parameters
stp mode { mstp | rstp | stp }
mstp: indicates the MSTP mode.
rstp: indicates the RSTP mode.
stp: indicates the STP mode.
stp [ instance instance-id ] root { primary | secondary }
instance instance-id: specifies the ID of a spanning tree
instance. It needs to be specified in MSTP.
primary: indicates that the switching device functions as
the primary root bridge of a spanning tree.
secondary: indicates that the switching device functions
as the secondary root bridge of a spanning tree.

stp [ instance instance-id ] priority priority


priority priority: specifies the priority of the switching
device in a spanning tree. The priority ranges from 0 to
61440. The value is a multiple of 4096, such as 0, 4096
and 8192. The default is 32768.
stp [ instance instance-id ] cost cost
cost: specifies the path cost of a port. When the path
cost of a port changes, spanning tree recalculation will
be performed.
Precautions
On an STP/RSTP/MSTP network, each spanning tree has only
one root bridge, which is responsible for sending BPDUs and
connecting devices on the entire network. Because the root
bridge is important on a network, the switching device with
high performance and network hierarchy is required to be
selected as the root bridge. Such a device may not have high
priority, so you can run the stp root command to configure a
switching device as the root bridge in a spanning tree.
A switching device in a spanning tree cannot function as both
the primary and secondary root bridges.
After the stp root command is run to configure a switching
device as the primary root bridge, the priority value of the
switching device is 0 in the spanning tree and the priority
cannot be modified.
After the stp root command is run to configure a switching
device as the secondary root bridge, the priority value of the
switching device is 4096 in the spanning tree and the priority
cannot be modified.

Case description
In the preceding topology:
Requirement 1 involves interoperability between RSTP
and STP.
Requirement 2 involves the stp root command usage.
Requirement 3 involves the edge port, BPDU filtering,
and BPDU protection.

Command usage
The stp mcheck command configures a port to automatically
switch from the STP mode back to the RSTP/MSTP mode.
The stp edged-port default command configures all ports on
a switching device as edge ports.
The stp bpdu-filter default command configures all ports on a
switching device as BPDU-filter ports.
The stp bpdu-protection command enables BPDU protection
on a switching device.
The stp root-protection command enables root protection on
a port.
Precautions
After the stp bpdu-filter default and stp edged-port default
commands are run in the system view, none of the ports on
the device will initiate any BPDUs or negotiate with the directly
connected port on the remote device, and all the ports are in
Forwarding state. This may lead to a loop and cause a
broadcast storm. Exercise caution when using the stp bpdufilter default and stp edged-port default commands in the
system view.
After BPDU protection is enabled on a switching device, the
switching device sets an edge port in error-down state if the
edge port receives a BPDU and retains the port as an edge
port.

The role of a designated port enabled with root protection


cannot be changed. When a designated port enabled with root
protection receives a BPDU with a higher priority, the port
enters the Discarding state and does not forward packets. If
the port does not receive any BPDUs with higher priority after
a given period of time (generally two Forward Delay periods),
the port automatically enters the Forwarding state.

Case description
S1 must be configured as the root bridge in MSTI2 and S3
must be configured as the root bridge in MSTI3 to meet
requirement 3, the Alternate port as figure above. So, S1 need
be configured as the root bridge in MSTI2, S2, S3, and S4
must be in descending order of priority; and S3 need be
configured as the root bridge in MSTI3, S1, S4, and S2 must
be in descending order of priority.

Command usage
The region-name command configures the MST region name
of a switching device.
The instance command maps a VLAN to an MSTI.
The revision-level command configures the revision level of
an MST region of a switching device. The default value is 0.
The active region-configuration command activates the
configuration of an MST region.
The stp loop-protection command enables loop protection on
a port.
Precautions
Two switching devices belong to the same MST region only
when they have the following identical configurations:
MST region name
Mappings between MSTIs and VLANs
MST region revision level
Loop protection
On a network running a spanning tree protocol, a
switching device maintains the status of the root port
and blocked port by continuously receiving BPDUs
from the upstream switching device.

If ports cannot receive BPDUs from the upstream


switching device due to link congestion or
unidirectional link failure, the switching device will reselect a root port. The original root port then becomes
a designated port and the original blocked port enters
the Forwarding state. As a result, loops may occur on
the network.
Loop protection can be deployed to prevent this
problem. If the root port or alternate port cannot receive
BPDUs from the upstream device for a long period of
time after loop protection is enabled, the root port or
alternate port will send a notification message to the
NMS. The root port will enter the Discarding state, and
the alternate port remains in Blocking state and no
longer forwards packets. This prevents loops on the
network. The root port or alternate port restores the
Forwarding state after receiving BPDUs.

If the topology of an MSTI changes, the forwarding paths of VLANs that


are mapped to this MSTI change. As a result, ARP entries relevant to
these VLANs need to be updated. Based on methods for processing
ARP entries, the convergence modes of a spanning tree protocol are
classified into fast and normal:
In fast mode, the switch directly deletes the ARP entries that
need to be updated in an ARP table.
In normal mode, the switch ages the ARP entries that need to
be updated in the ARP table. If the number of ARP probes for
aging ARP entries is larger than 0, the switch probes these
ARP entries before aging them.
In fast mode, frequent ARP entry deletion will affect services
and even may cause 100% CPU usage. As a result, packet
processing will time out, causing network flapping.

Unicast
In unicast mode, the amount of data transmitted on a network
is proportional to the number of users that require the data. If a
large number of users require the same data, the multicast
source must send many copies of data to these users,
consuming high bandwidth on the multicast source and
network. Therefore, the unicast mode is not suitable for batch
data transmission and is applicable only to networks with a
small number of users.
Broadcast
In broadcast mode, data is sent to all hosts on a network
segment regardless of whether they require the data. This
threatens information security and causes broadcast storms on
the network segment. Therefore, the broadcast mode is not
suitable for data transmission from a source to specified
destinations. In addition, the broadcast mode wastes network
bandwidth.
Multicast has the following advantages over unicast and broadcast:
Compared with the unicast mode, the multicast mode starts to
copy data and distribute data copies on the network node as
far from the source as possible. Therefore, the amount of data
and the level of network resource consumption will not
increase greatly when the number of receivers increases.

Compared with the broadcast mode, the multicast mode


transmits data only to receivers that require the data. This
saves network resources and enhances data transmission
security.

Multicast basic concepts


Multicast group: A group of receivers identified by an IP
multicast address. User hosts (or other receiver devices) that
have joined a multicast group become members of the group
and can identify and receive the IP packets destined for the
multicast group address.
Multicast source: A sender of multicast data. The server in the
topology is a multicast source. A multicast source can
simultaneously send data to multiple multicast groups. Multiple
multicast sources can simultaneously send data to the same
multicast group. A multicast source does not need to join any
multicast groups.
Multicast group member: A host that has joined a multicast
group. PC1 and PC2 in the following topology are multicast
group members. Memberships in a multicast group change
dynamically. Hosts can join or leave a multicast group anytime.
Members of a multicast group are located anywhere on a
network.
Multicast router: A router or Layer 3 switch that supports IP
multicast. The routers in the following topology are multicast
routers. In addition to multicast routing functions, multicast
routers connected to user network segments provide multicast
membership management.

Multicast service models are classified for receiver hosts and do not
affect multicast sources. All multicast data packets sent from a
multicast source use the IP address of the multicast source as the
source IP address and use a multicast group address as the
destination address. Depending on whether receiver hosts can select
multicast sources, two multicast models are defined: any-source
multicast (ASM) model and source-specific multicast (SSM) model. The
two models use multicast group addresses in different ranges.
ASM model: Receiver hosts can only specify the group they
want to join and cannot select multicast sources.
SSM model: Receiver hosts can specify the multicast sources
from which they want to receive multicast data when they join
a group. After joining the group, the hosts receive only the data
sent from the specified sources.

Multicast addresses
IP addresses 224.0.0.0 to 224.0.0.255 are reserved as
permanent group addresses by the Internet Assigned
Numbers Authority (IANA). In this address range, 224.0.0.0 is
not allocated, and the other addresses are used by routing
protocols for topology discovery and maintenance. These
addresses are locally valid. Packets with these addresses will
not be forwarded by routers regardless of the time-to-live (TTL)
values in the packets.
Addresses in the range of 224.0.1.0 to 231.255.255.255 and
233.0.0.0 to 238.255.255.255 are ASM group addresses and
are globally valid.
Addresses 232.0.0.0 to 232.255.255.255 are SSM group
addresses available to users and are globally valid.
Addresses 239.0.0.0 to 239.255.255.255 are local
administrative multicast addresses and are valid only in the
local administrative domain. Local administrative group
addresses are private addresses. A local administrative group
address can be used in different administrative domains.

Mapping from IPv4 multicast addresses to MAC addresses


The first four bits of an IPv4 multicast address are 1110,
mapped to the leftmost 25 bits of a MAC multicast address.
Only 23 bits of the last 28 bits are mapped to a MAC address.
This means that 5 bits of the IP address are lost. As a result,
32 multicast IP addresses are mapped to the same MAC
address. For example, IP multicast addresses 224.0.1.1,
224.128.1.1, 225.0.1.1, and 239.128.1.1 are all mapped to
MAC multicast address 01-00-5e-00-01-01. Address conflicts
must be considered in address assignment.

IGMP
IGMP is deployed between multicast routers and user hosts.
On a multicast router, IGMP is configured on interfaces
connected to hosts.
On hosts, IGMP allows group members to dynamically join and
leave multicast groups. On routers, IGMP manages and
maintains group memberships and exchanges information with
upper-layer multicast routing protocols.
PIM
PIM has two modes: PIM-DM and PIM-SM.
It must be enabled on all interfaces of all multicast routers.
It provides multicast routing and forwarding, and maintains the
multicast routing table based on network topology changes.
IGMP snooping
IGMP snooping is deployed in VLANs on Layer 2 switches
between multicast routers and hosts.
It listens on IGMP messages exchanged between routers and
hosts to create and maintain a Layer 2 multicast forwarding
table. In this manner, multicast data can be forwarded on a
Layer 2 network.

IGMP
IGMP is an IPv4 group membership management protocol in
the TCP/IP protocol suite. IP hosts use IGMP to report their
group memberships to any immediately-neighboring multicast
routers.
IGMP is deployed between multicast routers and hosts. On a
multicast router, IGMP is configured on interfaces connected
to hosts.
On hosts, IGMP allows group members to dynamically join and
leave multicast groups. On routers, IGMP manages and
maintains group memberships and exchanges information with
upper-layer multicast routing protocols.
The IGMP versions are backward compatible. Therefore, a
multicast router running a later IGMP version can identify
Membership Report messages sent from hosts running an
earlier IGMP version, although the IGMP messages in different
versions use different formats.
All of the IGMP versions support the any-source multicast
(ASM) model. IGMPv3 can be independently used in the
source-specific multicast (SSM) model, whereas IGMPv1 and
IGMPv2 must be used with SSM mapping.

IGMP messages are encapsulated in IP packets. IGMPv1 defines the


following types of messages:
General Query: Sent by a querier to all hosts and routers on
the shared network segment to discover which multicast
groups have members on the network segment.
Report: Sent by a host to request to join a multicast group or
respond to a General Query message.
How IGMPv1 works
IGMPv1 uses a query-report mechanism to manage multicast
groups. When multiple multicast routers exist on a network
segment, one router is elected as the IGMP querier to send
Query messages. In IGMPv1 implementation, a unique Assert
winner or designated router (DR) is elected by Protocol
Independent Multicast (PIM) to work as the querier. (The
election mechanism will be described later). The querier is the
only device that sends Membership Query messages on the
local network segment.
General query and report
In the multicast network, R1 and R2 connect to a user network
segment with three receivers: PC1, PC2, and PC3. R1 is the
querier on the network segment. PC1 and PC2 want to receive
data sent to group G1, and PC3 wants to receive data sent to
group G2. The general query and report process is as follows:

The IGMP querier (R1) sends a General Query


message with the destination address 224.0.0.1
(indicating all hosts and routers on the same network
segment). The IGMP querier sends General Query
messages at intervals. The interval can be configured
using a command, and the default interval is 60
seconds.
All hosts on the network segment receive the General
Query message. PC1 and PC2 then start a timer for G1
(Timer-G1), and PC3 starts a timer for G2 (Timer-G2).
The timer length is a random value between 0 and 10,
in seconds.
The host with the timer expiring first sends a Report
message for the multicast group. In this example,
Timer-G1 on PC1 expires first, and PC1 sends a
Report message with the destination address as G1.
When PC2 detects the Report message sent by PC1,
PC2 stops Timer-G1 and does not send any Report
messages for G1. This mechanism reduces the
number of Report messages transmitted on the
network segment, lowering loads on multicast routers.
When Timer-G2 on PC3 expires, PC3 sends a Report
message with the destination address as G2 to the
network segment.
After the routers receive the Report message, they
know that multicast groups G1 and G2 have members
on the local network segment. The routers use the
multicast routing protocol to create (*, G1) and (*, G2)
entries, in which * stands for any multicast source.
Once the routers receive data sent to G1 and G2, they
forward the data to this network segment.
A member joins a group
A new host PC4 connects to the network segment. PC4wants
to join multicast group G3 but detects no multicast data for G3.
In this case, PC4 immediately sends a Report message for G3
without waiting for a General Query message. After receiving
the Report message, the routers know that a member of G3
has connected to the network segment, and they create a (*,
G3) entry. When the routers receive data sent to G3, they
forward the data to this network segment.
A member leaves a group
IGMPv1 does not define a Leave message. After a host leaves
a multicast group, it no longer responds to General Query
messages. Assume that PC4 has left group G3. It does not
send Report messages for G3 when receiving General Query
messages.

Because there is no other member of G3, routers no longer


receive Report message for G3. After a period of time (130
seconds, Membership timeout interval = IGMP general query
interval x Robustness variable + Maximum response time), the
routers delete the multicast forwarding entry of G3.

IGMPv2 defines two types of new messages in addition to General


Query and Report messages:
Group-Specific Query: sent by a querier to a specified group
on the local network segment to check whether the group has
members.
Leave: sent by a host to notify routers on the local network
segment that it has left a group.
IGMPv2 modifies the General Query message format by
adding the Max Response Time field in the message. The field
value controls the response speed of group members and is
configurable.
Querier election
IGMPv2 defines an independent querier election mechanism.
When multiple multicast routers are available on a shared
network segment, the router with the smallest IP address is
elected as the querier. IGMPv1 depends on upper-layer
multicast protocols such as PIM for querier election.
Topology description
Each IGMPv2 router considers itself as a querier when
it starts and sends a General Query message to all
hosts and routers on the local network segment.
When other routers receive the General Query
message, they compare the source IP address of the
message with their own interface IP addresses.

The router with the smallest IP address becomes the


querier, and the other routers are non-queriers. In this
network, R1 has a smaller interface IP address than R2,
so R1 becomes the querier.
All non-querier routers start a timer (Other Querier
Present Timer, Timer length = Robustness variable x
IGMP general query interval + (1/2) x Maximum
response time. If the robustness variable, IGMP
general query interval, and maximum response time
are all default values, the Other Querier Present Timer
length is 125 seconds.) If non-querier routers receive a
Query message from the querier before the timer
expires, they reset the timer. If non-querier routers
receive no Query message from the querier when the
timer expires, they trigger election of a new querier.
Leave mechanism
In IGMPv2 implementation, the following process occurs when
PC3 wants to leave multicast group G2 and if PC3 is the group
member of last response query:
PC3 sends a Leave message for G2 to all multicast
routers on the local network segment. The destination
address of the Leave message is 224.0.0.2.
When the querier receives the Leave message, it
sends Group-Specific Query messages for G2 at
intervals to check whether G2 has other members on
the network segment. The sending interval and number
of Group-Specific Query messages sent by the querier
are configurable. By default, the querier sends a total of
two Group-Specific Query messages, at an interval of 1
second. In addition, the querier starts the membership
timer (Timer-Membership, Timer length = Interval for
sending Group-Specific Query messages x Number of
messages sent).
If G2 has no other member on the network segment,
the routers cannot receive any Report message for G2.
After Timer-Membership expires, the routers delete the
downstream interface connected to the network
segment from the (*, G2) entry. Then the routers no
longer forward data of G2 to the network segment.
If G2 has other members on the network segment, the
members send a Report message for G2 within the
maximum response time. The routers continue
maintaining membership of G2.

IGMPv3 was developed to support the source-specific multicast (SSM)


model. IGMPv3 messages can contain multicast source information so
that hosts can receive data sent from a specific source to a specific
group.
IGMPv3 also defines two types of messages: Query and Report.
Compared with IGMPv2, IGMPv3 has the following changes:
In addition to General Query and Group-Specific Query
messages, IGMPv3 defines a new Query message type:
Group-and-Source-Specific Query. A querier sends a Groupand-Source-Specific Query message to members of a specific
group on the shared network segment, to check whether the
group members want data from specific sources. A Groupand-Source-Specific Query message carries one or more
multicast source addresses.
A host can send a Report message to notify a multicast router
that it wants to join a multicast group and receive data from
specified multicast sources. IGMPv3 supports source filtering
and defines two filter modes: INCLUDE and EXCLUDE.
Group-source mappings are represented as (G, INCLUDE, (S1,
S2...)) or (G, EXCLUDE, (S1, S2...)). The (G, INCLUDE, (S1,
S2...)) entry indicates that a host only wants to receive data
sent from the listed multicast sources to group G. The (G,
EXCLUDE, (S1, S2...)) entry indicates that a host wants to
receive data sent from all multicast sources except the listed
ones to group G.

Group Record types in IGMPv3 Report messages


IS_IN
Indicates that the source filter mode is INCLUDE for a
multicast group. That is, members of the group want to
receive only data sent from the specified sources.
IS_EX
Indicates that the source filter mode is EXCLUDE for a
multicast group. That is, members of the group want to
receive data sent from multicast sources except the
specified sources.
TO_IN
Indicates that the source filter mode for a multicast
group has changed from EXCLUDE to INCLUDE. If the
source list is empty, the members have left the
multicast group.
TO_EX
Indicates that the source filter mode for a multicast
group has changed from INCLUDE to EXCLUDE.
ALLOW
Indicates that members of a multicast group want to
receive data from the specified multicast sources in
addition to the current sources. If the source filter mode
for the multicast group is INCLUDE, the specified
sources are added to the source list. If the source filter
mode is EXCLUDE, the specified sources are deleted
from the source list.

BLOCK
Indicates that members of a multicast group no longer
want to receive data from the specified multicast
sources. If the source filter mode for the multicast
group is INCLUDE, the specified sources are deleted
from the source list. If the source filter mode is
EXCLUDE, the specified sources are added to the
source list.
An IGMPv3 Report message can carry multiple groups, whereas an
IGMPv1 or IGMPv2 Report message can carry only one group. IGMPv3
greatly reduces the number of messages transmitted on a network.
Unlike IGMPv2, IGMPv3 does not define a Leave message. Group
members send Report messages of a specified type to notify multicast
routers that they have left a group. For example, if a member of group
225.1.1.1 wants to leave the group, it sends a Report message with
(225.1.1.1, TO_IN, (0)).

If IGMPv1 or IGMPv2 is running between a host and its upstream router,


the host cannot select multicast sources when it joins group G. The
host receives data from both S1 and S2, regardless of whether it
requires the data. If IGMPv3 is running between the host and its
upstream router, the host can choose to receive only data from S1
using either of the following methods:
Method 1: Send an IGMPv3 Report (G, IS_IN, (S1)),
requesting to receive only the data sent from S1 to G.
Method 2: Send an IGMPv3 (G, IS_EX, (S2)), notifying the
upstream router that it does not want to receive data from S2.
Only data sent from S1 is then forwarded to the host.

Compatibility with IGMPv1 routers


When IGMPv2 hosts discover an IGMPv1 router, they must
send IGMP Report messages to the router and cannot send
Leave messages.
If there are both IGMPv1 and IGMPv2 routers on a network
segment, the querier must send IGMPv1 messages.
Compatibility with IGMPv1 hosts
IGMP v2 hosts must allow their Report messages to be
suppressed by IGMPv1 Report messages. Otherwise, the
querier will not know existence of IGMPv1 hosts on the shared
network segment. If the querier is an IGMPv2 router and
receives a Leave message for a group (there are IGMPv1
hosts in the group), the IGMPv1 hosts will not receive traffic for
this group.
If an IGMPv2 router detects IGMPv1 hosts on the local
network segment, the router ignores any subsequent Leave
messages received.

SSM mapping is implemented based on static SSM mapping entries. A


multicast router converts (*, G) information in IGMPv1 and IGMPv2
Report messages to (S, G) information according to static SSM
mapping entries, so as to provide the SSM service for IGMPv1 and
IGMPv2 hosts. By default, SSM group addresses range from 232.0.0.0
to 232.255.255.255.
IGMP SSM mapping does not apply to IGMPv3 Report messages. To
enable hosts running any IGMP version on a network segment to
obtain the SSM service, IGMPv3 must run on interfaces of multicast
routers on the network segment.
With SSM mapping entries configured, a router checks the group
address G in each IGMPv1 or IGMPv2 Report message received, and
processes the message based on the check result:
If G is in the range of any-source multicast (ASM) group
addresses, the router provides the ASM service for the host.
If G is in the range of SSM group addresses:
When the router has no SSM mapping entry matching G,
it does not provide the SSM service and drops the
Report message.
If the router has an SSM mapping entry matching G, it
converts (*, G) information in the Report message into (S,
G) information and provides the SSM service for the host.
Topology description

On an SSM network, PC1 runs IGMPv3, PC2 runs IGMPv2, and


PC3 runs IGMPv1. PC2 and PC3 cannot run IGMPv3. To
provide the SSM service for all the hosts on the network
segment, IGMP SSM mapping must be configured on R1.
Before SSM mapping is enabled, the group-source mappings
on R1 are as follows:
Group 232.0.0.0/8 mapped to source 10.10.1.1
Group 232.1.0.0/16 mapped to source 10.10.2.2
Group 232.1.1.0/24 mapped to source 10.10.3.3
After SSM mapping is enabled on R1, R1 checks group
addresses of received packets to see whether the group
addresses are in the SSM group address range. If the group
addresses are in the SSM group address range, R1 generates
the following multicast entries according to the configured SSM
mapping entries. If a group address is mapped to multiple
sources, R1 generates multiple (S, G) entries. The following are
entries generated according to information in Report messages
sent from PC2 and PC3:
10.10.1.1232.1.2.2
10.10.2.2232.1.2.2
10.10.1.1232.1.3.3
10.10.2.2232.1.3.3

Report message to the upstream device. The upstream device can


send multicast packets to the host after receiving the Report message.
IGMP messages are encapsulated in IP packets (Layer 3 packets).
Layer 2 devices between hosts and multicast routers, however, cannot
process Layer 3 information carried in IP packets. In addition, Layer 2
devices cannot learn any MAC multicast address because the source
MAC addresses of link layer data frames are not MAC multicast
addresses. When a Layer 2 device receives a data frame with a
multicast destination MAC address, the device cannot find a matching
entry in its MAC address table. Consequently, the device broadcasts
the multicast packet. This wastes bandwidth resources and poses
threats to network security.

Concepts

A router port is a link layer device's port towards a


multicast router. The link layer multicast device
receives packets through the router port. Router ports
are classified into two types:
Dynamic router port: A port that can receive
IGMP Query messages or PIM Hello messages
whose source addresses are not 0.0.0.0.
Dynamic router ports are dynamically
maintained based on protocol packets
exchanged between multicast devices and
hosts. Each dynamic router port has a timer.
When the timer expires, the member port ages
out.
Static router port: Manually specified using a
command. Static router ports will not age out.
A group member port is a port towards user hosts. A
link layer multicast device sends multicast packets to
receiver hosts through group member ports. Group
member ports are classified into two types:
Dynamic member port: A port that can receive
IGMP Report messages. Dynamic member
ports are dynamically maintained based on
protocol packets exchanged between multicast
devices and hosts.

Each dynamic member port has a timer. When


the timer expires, the member port ages out.
Static member port: Manually specified using a
command. Static member ports will not age out.
The output port list is important information for layer-2
multicast, include port of router and port of member.
Working mechanisms
When a router port on an Ethernet switch receives an
IGMP General Query message, the switch resets the
aging timer of the router port. If the port that receives
the General Query message is not a router port, the
switch starts the aging timer for the port. (The aging
time is 180 seconds or the Holdtime value carried in
PIM Hello messages received by the switch. The
default Holdtime value is 105 seconds.)
When an Ethernet switch receives an IGMP Report
message, it checks whether there is a MAC multicast
group matching the IP multicast group that the user
wants to join.
If the MAC multicast group does not exist, the
switch creates the MAC multicast group, adds
the port that receives the Report message to
the MAC multicast group, and starts the aging
timer on the port (Timer length = Robustness
variable x General query interval + Maximum
response time). In addition, the switch adds all
router ports in the same VLAN as the member
port to the MAC multicast forwarding entry. It
then creates an IP multicast group and adds the
port that receives the Report message to the IP
multicast group.
If the MAC multicast group exists but the port
that receives the IGMP Report message is not
in the group, the switch adds the port to the
MAC multicast group and starts the aging timer
on the port. The switch then checks whether the
IP multicast group exists. If the IP multicast
group does not exist, the switch creates the IP
multicast group and adds the port to it. If the IP
multicast group exists, the switch adds the port
to the group directly.
If the MAC multicast group exists and the port
that receives the IGMP Report message is
already in the group, the switch resets the aging
timer on the port.

IGMP Leave message: When an Ethernet switch


receives an IGMP Leave message for a group on a
port, it sends an IGMP Group-Specific Query
message to the port to check whether the group has
other members on the port. At the same time, the
switch starts the query response timer (Timer length =
Group-specific query interval x Robustness variable).
If the switch does not receive any IGMP Report
message for the group when the query response
timer expires, it deletes the port from the matching
MAC multicast group. If the MAC multicast group has
no member port, the switch requests the upstream
multicast router to delete this branch from the
multicast tree.

Layer 2 multicast
If users in different VLANs require the same multicast data, the
upstream router still has to send multiple copies of identical
multicast data to different VLANs.
Users in VLAN 2 and VLAN 3 need to receive the same
multicast data flow. Multicast router R1 replicates the multicast
data in each VLAN and sends two copies of data to
downstream switch S1. This wastes bandwidth between the
router and Layer 2 device and increases loads on the router.
Multicast VLAN
The multicast VLAN feature allows Layer 2 network devices to
replicate multicast data across VLANs.
After the multicast VLAN function is configured on S1, R1
replicates multicast data in the multicast VLAN (VLAN 4) and
sends only one copy to S1. As the router does not need to
replicate multicast data in VLAN 2 and VLAN 3, network
bandwidth is conserved and loads on the router are reduced.
Concepts
Multicast VLAN: VLAN to which a network-side interface
belongs. A multicast VLAN is used to aggregate multicast data
flows. One multicast VLAN can be bound to multiple user
VLANs.
User VLAN: VLAN to which a user-side interface belongs. A
user VLAN is used to receive multicast data flows from the
multicast VLAN. A user VLAN can be bound only to one
multicast VLAN.

We have learned about the Internet Group Management Protocol


(IGMP). The IGMP protocol runs between receiver hosts and multicast
routers, whereas a multicast routing protocol needs to run between
routers.
A multicast routing protocol is used to create and maintain multicast
routes, and to forward multicast data packets correctly and efficiently.
Multicast routes construct a unidirectional loop-free data transmission
path from a data source to multiple receivers. This transmission path is
a multicast distribution tree. Multicast routing protocols can be intradomain or inter-domain protocols. This course introduces PIM, a typical
intra-domain multicast routing protocol.

PIM router
Routers with PIM enabled on interfaces are called PIM routers.
A multicast distribution tree contains the following types of PIM
routers:
Leaf router: The PIM router directly connected to a user
host, which may not be multicast group members.
First-hop router: The PIM router directly connected to a
multicast source on the multicast forwarding path and
responsible for forwarding multicast data from the
multicast source.
Last-hop router: The PIM router directly connected to a
multicast group member on the multicast forwarding
path and responsible for forwarding multicast data to
the member.

Multicast distribution tree


On a PIM network, a point-to-multipoint multicast forwarding
path is set up for each multicast group on routers. The
multicast forwarding path is in a tree topology, so it is also
called a multicast distribution tree.
There are two types multicast distribution trees: source tree
and shared tree.
Source tree
A source tree is rooted at a multicast source and combines the
shortest paths from the source to receivers.
Therefore, a source tree is also called a shortest path tree
(SPT). For a multicast group, routers need to establish an SPT
from each multicast source that sends packets to the group.
In this example, there are two multicast sources (S1 and S2)
and two receivers (PC1 and PC2). Therefore, two source trees
are established on the network.
PIM routing entry
PIM routing entries are created by the PIM protocol to guide
multicast forwarding.
An (S, G) entry contains a known multicast source for a group,
and is used to establish an SPT on PIM routers. (S, G) entries
apply to both PIM-DM and PIM-SM networks.
If an (S, G) entry exists on a PIM router, the router forwards
multicast packet according to the (S, G) entry.

Multicast distribution tree


On a PIM network, a point-to-multipoint multicast forwarding
path is set up for each multicast group on routers. The
multicast forwarding path is in a tree topology, so it is also
called a multicast distribution tree.
There are two types multicast distribution trees: source tree
and shared tree.
Shared tree
A shared tree is rooted at a rendezvous point (RP) and
combines shortest paths from the RP and all receivers. It is
therefore also called a rendezvous point tree (RPT). Each
multicast group has only one shared tree. All multicast sources
and receivers of a group send and multicast data packets
along the shared tree. A multicast source first sends data
packets to the RP, which then forwards the packets to all
receivers.
In this example, multicast sources S1 and S2 share one RPT.
PIM routing entry
PIM routing entries are created by the PIM protocol to guide
multicast forwarding.
A (*, G) entry contains a known multicast group, with the
multicast source unknown. It is used to establish an RPT on
PIM routers. (*, G) entries apply only to PIM-SM networks.
If no (S, G) entry is available and only a (*, G) entry exists on a
router, the router creates an (S, G) entry based on this (*, G)
entry, and then forwards multicast packets according to the (S,
G) entry.

PIM DM overview
PIM-DM uses the push mode to forward multicast packets and
is often used on small-scale networks with densely distributed
multicast group members. PIM-DM assumes that each
network segment has multicast group members. When a
multicast source sends multicast packets, PIM-DM floods the
multicast packets to all PIM routers on the network and prunes
the branches with no members. PIM-DM establishes and
maintains a unidirectional loop-free SPT (source-specific
shortest path tree) through periodical flood-and-prune
processes. If a new group member connects to a leaf router on
a pruned branch, the router can initiate a grafting process to
restore multicast forwarding before the next flood-and-prune
process.
PIM-DM uses the following mechanisms: neighbor discovery, flooding,
pruning, grafting, assert, and state refresh. The flooding, pruning, and
grafting mechanisms are used to establish an SPT.

PIM routers send Hello messages through all PIM-enabled interfaces.


The multicast packet encapsulating a Hello message has a destination
IP address of 224.0.0.13 (indicating all PIM routers on a network
segment), and the source IP address is the IP address of the interface
sending the multicast packet. The TTL value of the multicast packet is 1.
Hello messages are used to discover PIM neighbors, adjust PIM
protocol parameters, and maintain neighbor relationships.
Discovering PIM neighbors
PIM routers on the same network segment must
receive multicast packets with the destination address
224.0.0.13. By exchanging Hello messages, directly
connected PIM routers learn neighbor information and
establish neighbor relationships.
A PIM router can receive other PIM messages to
create multicast routing entries only after it establishes
neighbor relationships with other PIM routers.
Adjusting PIM protocol parameters
A Hello message carries the following PIM protocol
parameters to control PIM message exchange between
PIM neighbors:
DR_Priority: indicates the priority used by an
interface in DR election. The interface with the
highest priority becomes the DR. This
parameter is used for DR election only on PIMSM networks.

Holdtime: indicates timeout interval of a


neighbor relationship. A PIM router considers its
neighbor reachable within the Holdtime interval.
LAN_Delay: indicates the delay in transmitting
Prune messages on a shared network segment.
Neighbor-Tracking: indicates the neighbor
tracking function.
Override-Interval: indicates the interval for
overriding a pruning operation.
Maintaining neighbor relationships
PIM routers periodically send Hello messages to each
other. If a PIM router does not receive any Hello
message from a PIM neighbor within the Holdtime
interval, the router considers the neighbor unreachable
and deletes the neighbor from the neighbor list.
Changes of PIM neighbors lead to changes in the
multicast network topology. If an upstream or
downstream neighbor in the multicast distribution tree
is unreachable, multicast routes need to re-converge,
and the multicast distribution tree will change.
IGMPv1 querier election
Routers on a PIM-DM network compare the priorities and IP
addresses carried in Hello messages to elect a DR for each
network segment. The DR functions as the IGMPv1 querier on
the network segment.
If the DR fails, neighboring routers trigger a new DR election
process when the Hello timeout timer expires.
Hello timers
The default Hello interval is 30 seconds.
The default Hello timeout interval is 105 seconds.

On a PIM-DM network, multicast packets sent from a multicast source


are flooded throughout the entire network. When a PIM router receives
a multicast packet, the router performs an RPF check on the packet
against the unicast routing table. If the packet passes the RPF check,
the router creates an (S, G) entry, in which the downstream interface
list contains all the interfaces connected to downstream PIM neighbors.
The router then forwards subsequent multicast packets through each
downstream interface.
When multicast packets reach a leaf router, the leaf router processes
the packets as follows:
If the network segment connected to the leaf router has group
members, the leaf router adds its interface connected to the
network segment to the downstream interface list of the (S, G)
entry, and forwards subsequent multicast packets to the group
members.
If the network segment connected to the leaf router has no
group member and the leaf router does not need to forward
multicast packets to downstream PIM neighbors, the leaf
router initiates a pruning process.
Topology description
Multicast source S sends a multicast packet to multicast group
G.

When R1 receives the multicast packet, it performs an RPF


check on the packet against the unicast routing table. After the
packet passes the RPF check, R1 creates an (S, G) entry, in
which the downstream interface list contains interfaces
connected to R2 and R5. R1 then forwards subsequent
packets to R2 and R5.
R2 receives the multicast packet from R1. After the packet
passes the RPF check, R2 creates an (S, G) entry, in which
the downstream interface list contains the interfaces
connected to R3 and R4. R2 then forwards subsequent
packets to R3 and R4.
R5 receives the multicast packet from R1. Because the
downstream network segment does not have group members
or PIM neighbors, R5 triggers a pruning process.
R3 receives the multicast packet from R2. After the packet
passes the RPF check, R3 creates an (S, G) entry, in which
the downstream interface list contains the interface connected
to PC1. R3 then forwards subsequent packets to PC1
R4 receives the multicast packet from R2. Because the
downstream network segment does not have group members
or PIM neighbors, R4 triggers a pruning process.

When a PIM router receives a multicast packet, it performs an RPF


check on the packet. If the packet passes the RPF check but the
downstream network segment does not have any group member, the
PIM router sends a Prune message to the upstream router. After
receiving the Prune message from the downstream interface, the
upstream router deletes the downstream interface from the downstream
interface list of the (S, G) entry. The multicast packets will not be
forwarded to this downstream interface. A pruning operation is initiated
by a leaf router. The Prune message is sent upstream hop by hop, and
PIM routers receiving the Prune message deletes the downstream
interface from the (S, G) entry. Finally, the multicast distribution tree
contains only branches with group members.
A PIM router starts a prune timer (210 seconds by default) for the
pruned downstream interface and resumes multicast forwarding on the
interface after the timer expires. Multicast packets are then flooded on
the entire network, and new group members can receive multicast
packets. Subsequently, leaf routers without group members attached
trigger pruning processes. PIM-DM updates the SPT through periodic
flood-and-prune processes.

After a downstream interface of a leaf router is pruned:


If new members join the multicast group on the interface and
want to receive multicast packets before the next flood-andprune process, the leaf router initiates a grafting process.
If no member joins the multicast group and multicast
forwarding still needs to be suppressed on the interface, the
leaf router initiates a state refresh process.
Topology description
R5 sends a Prune message to R1 to notify R1 that the
downstream network segment no longer needs to receive
multicast data.
After receiving the Prune message, R1 stops forwarding data
through its downstream interface connecting to R5, and
deletes this downstream interface from the (S, G) entry. R1
has another downstream interface in forwarding state, so the
pruning process ends. Subsequent multicast packets are only
forwarded to R2.
R4 sends a Prune message to R2 to notify R2 that the
downstream network segment no longer needs to receive
multicast data.
After receiving the Prune message, R2 waits for 3 seconds
(LAN-delay +override-interval). R3 also receives the Prune
message sent by R4. Because R3 connects to a downstream
receiver, R3 sends a Join message to override the Prune
message.
After R2 receives the Join message, it ignores the Prune
message sent from R4 and continues forwarding multicast
traffic to the downstream interface.
The LAN-delay and override-interval are explained as follows:
Hello messages carry the LAN-delay and override-interval
parameters. The LAN-delay parameter specifies the packet
transmission delay (500 milliseconds by default), and the
override-interval specifies the interval during which
downstream routers can override a pruning operation (2500
milliseconds by default).
If a router sends a Prune message upstream but other routers
on the same network segment still need to receive multicast
data, they must send a Join message to override the pruning
operation within the override-interval.
If routers on a link have different override-interval values, the
maximum override-interval value used among the routers is
used on the link.

The total of LAN-delay and override-interval is the prunepending timer (PPT). After a router receives a Prune message
from a downstream interface, it waits until the PPT expires,
and then prune the downstream interface. If the router receives
a Join message from the downstream interface before the PPT
expires, it cancels the pruning operation.

Multicast routers prune branches without group members to establish a


new SPT according to received Prune messages. Although routers no
longer forward multicast packets to pruned branches, the
corresponding (S, G) entry still exists on each router. Once new
members join the group on the pruned branches, the downstream
interfaces can be quickly added to the entry to resume multicast
forwarding.

PIM-DM uses the grafting mechanism to enable new group members


on a pruned network segment to rapidly obtain multicast data. A leaf
router can determine that a multicast group G has new members on a
network segment according to IGMP messages. The leaf router then
sends a Graft message to notify the upstream router that the
downstream network segment needs multicast data. After receiving the
Graft message, the upstream router adds the downstream interface to
the downstream interface list of the (S, G) entry.
A grafting process is initiated by a leaf router and ends on the router
that can receive multicast packets.
Topology description
Pruned downstream nodes can resume multicast forwarding
when the prune timer expires, but they must wait for 210
seconds before the prune timer expires. This is quite a long
time for new group members. To reduce the waiting time, a
pruned downstream router can send a Graft message to notify
the upstream router.
When the network segment connected to R5 has a new group
member, R5 sends a Graft message towards the multicast
source S. When R1 receives the Graft message, it replies with
a Graft ACK message. After that, multicast data can be
forwarded to the previously pruned branch.

To prevent pruned interfaces from resuming multicast forwarding after


the prune timer expires, the first-hop router nearest to the multicast
source periodically sends a State-Refresh message throughout the
entire PIM-DM network. Other PIM routers reset the prune timer after
receiving the State-Refresh message. In this way, pruned downstream
interfaces remain suppressed if leaf routers connected to the interfaces
have no new group members attached.
Topology description
R1 sends a State-Refresh message to R2 and R5 to initiate a
state refresh process.
R5 has a pruned interface and resets the prune timer on the
interface. If R5 still has no group member on the connected
network segment when the next flood-and-prune process
starts, the pruned interface is still suppressed.

If multicast PIM routers forward multicast packets to the same network


segment after the multicast packets pass the RPF check, only one PIM
router can be selected through the assert mechanism to forward
multicast packets to the network segment. When a PIM router receives
a multicast packet that is the same as the multicast packet it sends to
other neighbors, the PIM router sends an Assert message with the
destination address 224.0.0.13 to all other PIM routers on the same
network segment. When the other PIM routers receive the Assert
message, they compare local parameters with those carried in the
Assert message for assert election. The assert election is performed
according to the following rules:
The router with the highest priority of the unicast routing
protocol wins.
If these routers have the same priority, the router with the
smallest route cost to the multicast source wins.
If these routers have the same priority and the same route cost
to the multicast source, the router with the largest downstream
interface IP address wins.
The PIM routers perform the following operations based on assert
election results:
The downstream interface of the router that wins the election is
the assert winner and forwards multicast packets to the shared
network segment.

The downstream interfaces the PIM routers that lose the


election are assert losers and no longer forward multicast
packets to the shared network segment. The PIM routers
delete the downstream interfaces from the downstream
interface list of their (S, G) entries.
After the assert election is complete, only one downstream
interface is active on the network segment, so only one copy of
multicast packets is transmitted to the network segment. All
assert losers can resume multicast packet forwarding after a
specified interval (180 seconds by default), triggering periodic
assert elections.
Topology description
In this example, R2 has a smaller cost to the multicast source
than R3.
R2 and R3 receive a multicast packet from each other through
their downstream interfaces, but both the packets fail the RPF
check and are dropped. R2 and R3 then send an Assert
message to the network segment.
R2 compares its routing information with that carried in the
Assert message sent by R3 and finds that its own route cost to
the multicast source is smaller. Therefore, R2 wins the election.
R2 continues forwards multicast packets to the network
segment, whereas R3 drops subsequent multicast packets
because these packets fail the RPF check.
R3 compares its routing information with that carried in the
Assert message sent by R2 and finds that its own router cost
to the multicast source is larger. Therefore, R3 fails the
election. R3 then blocks multicast forwarding on its
downstream interface and deletes the interface from the
downstream interface list of the (S, G) entry.

PIM-SM applies to the any-source multicast (ASM) and source-specific


multicast (SSM) models. In the ASM model, PIM-SM uses the pull
mode to forward multicast packets. This mode is used in networks with
a lot of sparsely distributed group members. PIM-SM is implemented as
follows:
A PIM router works as the rendezvous point (RP) to serve
group members or multicast sources that appear on the
network. All PIM routers on the network know the RP's position.
When a new group member appears on the network (a host
sends an IGMP message to request to join a multicast group
G), the last-hop router sends a Join message to the RP. The
Join message is transmitted hop by hop, and all the routers
receiving the message create a (*, G) entry. Finally, an RPT
rooted at the RP is set up.
When an active multicast source appears on the network (the
multicast source sends the first multicast packet to a multicast
group G), the first-hop router encapsulates the multicast data
in a Register message and sends the Register message to the
RP in unicast mode. The RP then creates an (S, G) entry, and
the multicast source is registered on the RP.
PIM-SM uses the following mechanisms in the ASM model: neighbor
discovery, DR election, RP discovery, RPT setup, multicast source
registration, SPT switchover, pruning, and assert. You can also
configure a bootstrap router (BSR) to implement fine-grained
management in a PIM-SM domain.

The network segment of a multicast source or receivers may connect to


multiple PIM routers. The PIM routers exchange Hello messages to set
up PIM neighbor relationships. The Hello message sent by a router
carries the DR priority of the router and IP address of the interface
connected to the network segment. Each PIM router compares its own
information with the information carried in the Hello messages received
from its neighbors. The DR elected among the PIM routers is
responsible for forwarding multicast packets for the multicast source or
receivers. The DR is elected according to the following rules:
The PIM router with the highest DR priority wins (all routers on
the network segment support the DR priority).
If PIM routers have the same DR priority or at least one PIM
router does not allow the DR priority field in Hello messages,
the PIM router with the largest IP address wins.
If the current DR fails, other PIM routers trigger a new DR
election when the PIM neighbor timeout timer expires (105
seconds by default).
In the ASM model, the DR provides the following functions:
The DR on the shared network segment connected to a
multicast source sends Register messages to the RP. This DR
is called the source DR.
The DR connected to the shared network segment of group
members sends Join messages to the RP. This DR is called
the receiver DR.

On a PIM-SM network, an RPT is a multicast distribution tree with the


RP as the root and PIM routers that have group memberships as
leaves. In the topology shown in the figure, when a group member
appears on the network (a user sends an IGMP message to join a
multicast group G), the receiver DR sends a Join message to the RP.
The Join message is transmitted hop by hop, and routers receiving the
message create a (*, G) entry. Finally, an RPT rooted at the RP is set
up.

On a PIM-SM network, any new multicast source must register on the


RP so that the RP can forward multicast data from the multicast source
to group members. The multicast source registration process is as
follows:
A multicast source sends a multicast packet to the source DR
(R1).
After receiving the multicast packet, the source DR
encapsulates the multicast packet into a Register message
and sends the Register message to the RP (R2).
The RP decapsulates the received Register message, creates
an (S, G) entry, and forwards the multicast packet to group
members along the RPT.
The RP no longer needs any Register message sent from R1,
so it sends a Register-Stop message to R1. R1 then stops
sending Register messages to the RP.

On a PIM-SM network, each multicast group can have only RP and one
RPT. Before an SPT switchover, all multicast packets destined for a
multicast group must be encapsulated in Register messages and then
sent to the RP. The RP decapsulates Register messages and forwards
multicast packets along the RPT. All multicast packets pass through the
RP. As the rate of multicast packets increases, the RP faces heavy
loads. To resolve this problem, PIM-SM allows the RP or the receiver
DR to trigger an SPT switchover.
SPT switchover conditions
When the multicast traffic rate exceeds the specified threshold,
PIM-SM triggers an RPT-to-SPT switchover.
According to default configuration of the VRP, routers
connected to receivers join the SPT immediately after
receiving the first multicast data packet from a multicast source.

The receiver DR periodically checks the rate of multicast packets for an


(S, G) and triggers an SPT switchover when the rate exceeds the
specified threshold.
The receiver DR sends a Join message to the source DR. The
Join message is transmitted hop by hop, and routers receiving
the message create an (S, G) entry. Finally, an SPT is set up
from the source DR to the receiver DR.

After the SPT is set up, the receiver DR sends a Prune


message to the RP. The Prune message is transmitted hop by
hop along the RPT, and routers receiving the message delete
their downstream interfaces from the (S, G) entry. After the
pruning process is complete, the RP no longer forwards
multicast packets along the RPT.
If the SPT does not pass through the RP, the RP continues to
send a Prune message to the source DR, so that routers along
the path between the RP and source DR delete their
downstream interfaces from the (S, G) entry. After the pruning
process is complete, the source DR no longer forwards
multicast packets along the SPT to the RP.

On a PIM-SM network, the root of a shared tree is an RP.


An RP provides the following functions:
Forwards all multicast packets transmitted in the shared tree to
receivers.
Forwards multicast data of several or all multicast groups. A
network can have one or multiple RPs. You can configure an
RP to serve multicast groups in a specified range. An RP can
serve multiple multicast groups, but each multicast group can
have only one RP. Multicast packets sent from a multicast
source to all receivers of a group are aggregated on the RP.
RP discovery:
Static RP: A static RP address is specified on all PIM routers
in the PIM domain using the static-rp rp-address command.
Dynamic RP: Several PIM routers in a PIM domain are
configured as candidate-RPs (C-RPs), among which an RP is
elected. Candidate bootstrap routers (C-BSRs) also need to
be configured. A BSR is elected among the C-BSRs.
An RP is the core router in a PIM-SM domain. If a small and simple
network needs to transmit light multicast traffic and one RP is enough,
you can specify the RP address statically on all routers in the PIM-SM
domain. In most cases, PIM-SM networks have a large scale and need
to transmit heavy multicast traffic. To reduce loads on each RP and
optimize shared tree topology, different multicast groups should have
different RPs. Dynamic RP election is required in this condition, and a
BSR is required for RP election.

During a BSR election, each C-BSR considers itself as the BSR and
sends a Bootstrap message to the entire network. The Bootstrap
message carries the C-BSR address and priority. Each PIM router
receives Bootstrap messages from all C-BSRs and compares C-BSR
information to elect a BSR. The BSR is elected according to the
following rules:
The C-BSR with the highest priority wins (larger priority value,
higher priority).
If C-BSRs have the same priority, the C-BSR with the largest
IP address wins.
An RP election process is as follows:
Each C-RP sends an Advertisement message to the BSR. An
Advertisement message carries the C-RP address, the range
of multicast groups the C-RP serves, and the C-RP priority.
The BSR summarizes the C-RP information in an RP-Set,
encapsulates the RP-Set in a Bootstrap message, and
advertises the message all PIM-SM routers on the network.
PIM routers follow the same rules to compare RP information
in the RP-Set and elect an RP from multiple C-RP for the
same group. The RP election rules are as follows:
The C-RP interface with the longest address mask
wins.
The C-RP with the highest priority wins (larger priority
value, lower priority).

If C-RPs have the same priority, routers use a hash


algorithm, and the C-RP with the largest hash value
wins.
If all the preceding parameters are the same, the C-RP
with the largest IP address wins.
All PIM routers use the same RP-Set and election rules, so
they obtain mappings between RPs and multicast groups. The
PIM routers save the mappings for subsequent multicast
forwarding.

The SSM model is implemented based on PIM-SM and


IGMPv3/MLDv2. In this model, an SPT can be established from a
multicast source to group members without the need to maintain an RP,
establish an RPT, or register the multicast source.
In the SSM model, hosts can determine the location of the multicast
sources. Therefore, they can specify the multicast sources from which
they want to receive multicast data when joining a multicast group.
After the receiver DR receives the request from a host, it sends a Join
message to the source DR. The Join message is then transmitted
upstream hop by hop. An SPT is then set up from the multicast source
to the host.
In the SSM model, PIM-SM uses the following mechanisms: neighbor
discovery, DR election, and SPT setup.
An SPT setup process is as follows:
R3 and R5 learn that hosts in the same multicast group
request data from different multicast sources through IGMPv3.
Therefore, R3 and R5 send Join messages toward the sources.
PIM routers that receive the Join message create (S1, G) and
(S2, G) entries according to the Join message. In this way,
they set up an SPT from S1 to PC1 and an SPT S2 to PC2.
Multicast packets from the two multicast sources are then
forwarded to the respective receivers along the SPTs.

RPF check
When a router receives a multicast packet, it searches the
unicast routing table for the route to the source address of the
packet. After finding the route, the router checks whether the
outbound interface of the route is the same as the inbound
interface of the multicast packet. If they are the same, the
router considers that the multicast packet is received from a
correct interface. This process is called an RPF check, which
ensures correct forwarding paths for multicast packets.
If multiple equal-cost routes are available, the route with the
largest next-hop address is used as the RPF route.
RPF checks can be performed based on unicast routes,
Multiprotocol Border Gateway Protocol (MBGP) routes, or
static multicast routes. The priority order of these routes is
static multicast routes > MBGP routes > unicast routes.
Topology description
A multicast stream sent from the source 152.10.2.2 arrives at
interface S1 of the router. The router checks the routing table
and finds that the multicast stream from this source should
arrive at interface S0. Therefore, the RPF check fails and the
multicast stream is dropped by the router.
A multicast stream sent from the source 152.10.2.2 arrives at
interface S0 of the router. The router checks the routing table
and finds that the RPF interface is also S0. The RPF check
succeeds, and the multicast stream is correctly forwarded.

Static multicast routing


For R3, the RPF neighbor towards the multicast source
(Source) is R1. Therefore, multicast packets sent from Source
are forwarded along the path Source -> R1 -> R3. If you
configure a multicast static route on R3 and specify R2 as the
RPF neighbor, the transmission path of multicast packets sent
from Source changes to Source-> R1-> R2-> R3. The
multicast path then diverges from the unicast path.

Case description
In this case, interconnection IP addresses are configured
according to the following rule:
If RTX connects to RTY, their interface IP addresses
used to connect to each other are XY.1.1.X and
XY.1.1.Y, network mask is 24.

Command usage
The multicast routing-enable command enables the
multicast routing function.
The pim dm command enables PIM-DM on an interface.
The pim hello-option dr-priority command sets the DR priority
for a PIM interface.
The igmp enable command enables IGMP on an interface.
The igmp version command specifies the IGMP version
running on an interface.
Precautions
In this network topology, R2 is the IGMP querier, and R3
forwards multicast packets to downstream receivers because
R3 is the assert winner.
The display pim routing-table command displays entries in
the PIM routing table.
The display pim routing-table fsm command displays
detailed information about the finite state machine (FSM) in the
PIM routing table.

Case description
The network topology is the same as that in PIM-DM
configuration. The network runs PIM-SM, and the transmission
scope of Bootstrap messages needs to be limited.

Command usage
The pim sm command enables PIM-SM on an interface.
The c-rp command configures a router to notify the BSR that it
is a C-RP.
The c-bsr command configures a C-BSR.
The pim bsr-boundary command configures the BSR
boundary of the PIM-SM domain on an interface.
Precautions
In this network topology, R2 is the IGMP querier, and R3
forwards multicast packets to downstream receivers because
R3 is the assert winner.
The display pim routing-table command displays entries in
the PIM routing table.
The display pim routing-table fsm command displays
detailed information about the FSM in the PIM routing table.

The method for checking the SPT in a PIM-SM network is similar to the
method for checking the RPT.

The method for checking the SPT in a PIM-SM network is similar to the
method for checking the RPT.

Case description
In this case, interconnection IP addresses are configured
according to the following rules:
If RTX connects to RTY, their interface IP addresses
used to connect to each other are XY.1.1.X and
XY.1.1.Y, network mask is 24.
The loopback interface address of RTX is X.X.X.X/32.

Pre-configuration
This page provides the basic OSPF configuration. In this case,
R1 is the DR in the FR network.

Results:
A Bootstrap message is transmitted from R1 to R2 and fails
the RPF check on R2, so R2 drops the message. To enable
Bootstrap messages to be forwarded by R2, configure a static
multicast route on R2 to change the RPF path.

Results:
A Bootstrap message is transmitted from R1 to R2 and fails
the RPF check on R2, so R2 drops the message. To enable
Bootstrap messages to be forwarded by R2, configure a static
multicast route on R2 to change the RPF path.

Results:
The ACL restricts the multicast address range.

IPv6 characteristics are as follows:


Address space: An IPv6 address is 128 bits long. A 128-bit
address structure allows for 2128 (4.3 billion x 4.3 billion x 4.3
billion x 4.3 billion) possible addresses. The biggest advantage
of IPv6 is its almost infinite address space.
Packet format: IPv6 uses a new protocol header format rather
than increasing the bits in the address field of an IPv4 packet
to 128 bits. The IPv6 data packets carry new packet headers.
An IPv6 packet header includes IPv6 basic and extension
headers. Some optional fields are moved to the extension
header following the IPv6 header. This enables intermediate
routers on the network to process IPv6 packet headers more
efficiently.
Autoconfiguration and readdressing: IPv6 provides address
autoconfiguration, which allows hosts to automatically discover
networks and obtain IPv6 addresses. This significantly
improves network manageability.
Hierarchical network structure: A huge address space allows
for the hierarchical network design in IPv6. The hierarchical
network design facilitates route summarization and improves
forwarding efficiency.
End-to-end security support: IPv6 supports IP Security (IPSec)
authentication and encryption at the network layer, so it
provides end-to-end security.

Quality of Service (QoS) support: IPv6 defines the Flow Label


field in the packet header. This field enables network routers to
differentiate data flows and provide special processing for the
identified data flows. With this field, the routers can identify
data flows without checking the inner data packets being
transmitted. In this way, QoS can be implemented even if the
valid payloads of data packets are encrypted.
Mobility: With the support for Router header and Destination
option header, IPv6 provides built-in mobility.

It should be noted that an IPv6 address can contain only one double
colon (::). Otherwise, a computer cannot determine the number of zeros
in a group when restoring the compressed address to the original 128bit address.

If the first 3 bits of an IPv6 unicast address are not 000, the interface ID
must be of 64 bits. If the first 3 bits are 000, there is no such limitation.
IEEE EUI-64 standards
The length of an interface ID is 64 bits. IEEE EUI-64 defines a
method to convert a 48-bit MAC address into a 64-bit IPv6
interface ID. In the MAC address, c bits indicate the vendor ID,
d bits indicate the vendor number ID, and 0 bit indicates a
global/local bit. g specifies whether the interface ID indicates a
single host or a host group. The specific conversion algorithm
is as follows: convert 0 to 1 and insert two bytes (FFFE)
between c and d.
The method for converting MAC addresses into IPv6 interface
IDs reduces the configuration workload. When stateless
address autoconfiguration (stateless address
autoconfiguration will be explicated in the following pages) is
used, you only need an IPv6 network prefix before obtaining
an IPv6 address.
The defect of this method is that an IPv6 address can be easily
calculated based on a MAC address.

IPv4 addresses are classified into unicast, multicast, and broadcast


addresses. Compared to IPv4, IPv6 has no broadcast address and
introduces a new address type: anycast address. IPv6 addresses are
classified into unicast, multicast, and anycast addresses.
An IPv6 unicast address identifies an interface. Packets sent
to an IPv6 unicast address are delivered to the interface
identified by the unicast address.
An IPv6 multicast address identifies a group of interfaces.
Packets sent to an IPv6 multicast address are delivered to all
the interfaces identified by the multicast address.
An IPv6 anycast address identifies multiple interfaces. Packets
sent to an anycast address are delivered to the nearest
interface that is identified by the anycast address, depending
on the routing protocols. In fact, anycast addresses and
unicast addresses use the same address space. The router
determines whether to send a packet in unicast mode or
anycast mode.

Global unicast address


An IPv6 global unicast address is an IPv6 address with a
global unicast prefix, which is similar to an IPv4 public address.
IPv6 global unicast addresses support route prefix
summarization, helping limit the number of global routing
entries.
A global unicast address consists of a global routing prefix,
subnet ID, and interface ID.
Global routing prefix: is assigned by a service provider
to an organization. A global routing prefix is of at least
48 bits. Currently, the first 3 bits of all the assigned
global routing prefixes are 001.
Subnet ID: is used by organizations to construct a local
network (site). There are a maximum of 64 bits for both
the global routing prefix and subnet ID. It is similar to
an IPv4 subnet number.
Interface ID: refers to the interface identifier. It can be
used to identify a device (host).

Link-local address
Link-local addresses have a limited application scope. An IPv6
link-local address can be used only for communication
between nodes on the same link. A link-local address uses a
link-local prefix FE80::/10 as the first 10 bits (1111111010 in
binary) and an interface ID as the last 64 bits.
When IPv6 runs on a node, each interface of the node is
automatically assigned a link-local address that consists of a
fixed prefix and an interface ID in EUI-64 format. This
mechanism enables two IPv6 nodes on the same link to
communicate without any additional configuration. Therefore,
link-local addresses are widely used in neighbor discovery and
stateless address autoconfiguration.
Routing devices do not forward IPv6 packets with the link-local
address as a source or destination address to devices on nonlocal links.

Unique local address


Unique local addresses are used only within a site. Site-local
addresses are deprecated in RFC 3879 and replaced by
unique local addresses in RFC 4193.
Unique local addresses are similar to IPv4 private addresses.
Any organization that does not obtain a global unicast address
from a service provider can use a unique local address.
Unique local addresses are routable only within a local
network but not on the Internet.
Fields in a unique local address can be described as follows:
Prefix: is fixed as FC00::/7.
L: is set to 1 if the address is valid within a local
network. The value 0 is reserved for future expansion.
Global ID: indicates a globally unique prefix, which is
pseudo-randomly allocated (for details, see RFC 4193).
Subnet ID: identifies a subnet within the site.
Interface ID: identifies an interface.
A unique local address has the following characteristics:
Has a globally unique prefix. The prefix is pseudorandomly allocated and has a high probability of
uniqueness.
Allows private connections between sites without
creating address conflicts.

Has a well-known prefix (FC00::/7) that allows for easy


route filtering by edge routers.
Does not conflict with any other addresses or cause
Internet route conflicts if it is leaked outside of the site
through routing.
Functions as a global unicast address to upper-layer
applications.
Is independent of the Internet Service Provider (ISP).

Unspecified address
An IPv6 unspecified address is 0:0:0:0:0:0:0:0/128 or ::/128,
indicating that an interface or a node does not have an IP
address. It can be used as the source IP address of some
packets, such as Neighbor Solicitation (NS) message in
duplicate address detection. Devices do not forward the
packets with the source IP address as an unspecified address.
Loopback address
An IPv6 loopback address is 0:0:0:0:0:0:0:1/128 or ::1/128.
Similar to IPv4 loopback address 127.0.0.1, the IPv6 loopback
address is used when a node needs to send IPv6 packets to
itself. This IPv6 loopback address is usually used as the IP
address of a virtual interface (a loopback interface for
example). The loopback address cannot be used as the
source or destination IP address of packets that need to be
forwarded.

IPv6 multicast address


Like an IPv4 multicast address, an IPv6 multicast address
identifies a group of interfaces, which usually belong to
different nodes. A node may belong to any number of multicast
groups. Packets sent to an IPv6 multicast address are
delivered to all the interfaces identified by the multicast
address.
An IPv6 multicast address is composed of a prefix, flag, scope,
and group ID (global ID):
Prefix: is fixed as FF00::/8 (1111 1111).
Flag: is 4 bits long. Currently, only the last bit is used.
The high-order 3 bits are reserved and must be set to
0s. The last bit 0 indicates a permanently-assigned
multicast address allocated by the Internet Assigned
Numbers Authority (IANA). The last bit 1 indicates a
non-permanently-assigned (transient) multicast
address.
Scope: is 4 bits long. It limits the scope where multicast
data flows are sent on the network.
Group ID (global ID): is 112 bits long. It identifies a
multicast group. RFC 2373 does not define all the 112
bits as a group ID but recommends using the low-order
32 bits as the group ID and setting all the remaining 80
bits to 0s.

IPv6 multicast addresses:


Like an IPv4 multicast address, an IPv6 multicast address
identifies a group of interfaces, which usually belong to
different nodes. A node may belong to any number of multicast
groups. Packets sent to an IPv6 multicast address are
delivered to all the interfaces identified by the multicast
address.
An IPv6 multicast address is composed of a prefix, flag, scope,
and group ID (global ID):
Prefix: is fixed as FF00::/8 (1111 1111).
Flag: is 4 bits long. Currently, only the last bit is used.
The high-order 3 bits are reserved and must be set to
0s. The last bit 0 indicates a permanently-assigned
multicast address allocated by the Internet Assigned
Numbers Authority (IANA). The last bit 1 indicates a
non-permanently-assigned (transient) multicast
address.
Scope: is 4 bits long. It limits the scope where multicast
data flows are sent on the network.
Group ID (global ID): is 112 bits long. It identifies a
multicast group. RFC 2373 does not define all the 112
bits as a group ID but recommends using the low-order
32 bits as the group ID and setting all the remaining 80
bits to 0s.

IPv6 anycast address


Anycast addresses are exclusive to IPv6. An anycast address
identifies a group of interfaces, and this group of interfaces
often belong to different nodes. Packets sent to an anycast
address are delivered to the nearest interface that is identified
by the anycast address, depending on the routing protocols.
The IPv6 anycast addresses can be used in One-to-One-ofMany communications. The receiver can be one interface of a
group. For example, a mobile subscriber needs to connect to
the nearest receive station. Using anycast addresses, the
mobile subscriber is not limited by physical locations.
Anycast addresses are allocated from the unicast address
space, using any of the unicast address formats. Thus,
anycast addresses are syntactically indistinguishable from
unicast addresses. The nodes to which an anycast address is
assigned must be explicitly configured to know that it is an
anycast address. Currently, anycast addresses are used only
as destination addresses, and are assigned to only routers.
A subnet-router anycast address is predefined in RFC 3513.
The interface ID of a subnet-router anycast address is all 0s.
Packets addressed to a subnet-router anycast address are
delivered to a certain router (the nearest router that is
identified by the address) in the subnet specified by the prefix
of the address. The nearest router is defined as being closest
in terms of routing distance.

An IPv6 packet has three parts: an IPv6 basic header, one or more
IPv6 extension headers, and an upper-layer protocol data unit (PDU).
IPv6 basic header
Each IPv6 packet must have an IPv6 basic header,
which is fixed as 40 bytes long.
The IPv6 basic header provides basic packet
forwarding information and will be parsed by all routers
on the forwarding path.
Extension headers
An IPv6 extension header is an optional header that
may follow the IPv6 basic header. An IPv6 packet may
carry zero, one, or more extension headers. The
extension headers may be different in lengths. The
IPv6 header and IPv6 extension header replace the
IPv4 header and its options. The IPv6 extension
header enhances IPv6 functions and has great
extensibility. Unlike the Options of an IPv4 header, the
maximum length of an IPv6 extension header is not
limited. Therefore, an IPv6 extension header can
contain all the extension data required by IPv6
communications.
The extension information about packet forwarding in
an IPv6 extension header is not parsed by all the
routers on the path, and is generally parsed by only the
destination router.

Upper-layer protocol data unit


An upper-layer PDU is composed of the upper-layer
protocol header and its payload such as an ICMPv6
packet, a TCP packet, or a UDP packet.

Fields in an IPv6 packet header are described as follows:


Version: is 4 bits long. In IPv6, the Version field value is 6.
Traffic Class: is 8 bits long. It indicates the class or priority of
an IPv6 packet. The Traffic Class field is similar to the TOS
field in an IPv4 packet and is mainly used in QoS control.
Flow Label: is 20 bits long. This field is added in IPv6 to
differentiate traffic. A flow label and source IP address identify
a data flow. Intermediate network devices can effectively
differentiate data flows based on this field.
Payload Length: is 16 bits long, which indicates the length of
the IPv6 payload. The payload is the rest of the IPv6 packet
following the basic header, including the extension header and
upper-layer PDU. This field indicates only the payload with the
maximum length of 65535 bytes. If the payload length exceeds
65535 bytes, the field is set to 0. The payload length is
expressed by the Jumbo Payload option in the Hop-by-Hop
Options header.
Next Header: is 8 bits long. This field identifies the type of the
first extension header that follows the IPv6 basic header or the
protocol type in the upper-layer PDU.
Hop Limit: is 8 bits long. This field is similar to the Time to Live
field in an IPv4 packet, defining the maximum number of hops
that an IP packet can pass through. The field value is
decremented by 1 by each router that forwards the IP packet.
When the field value becomes 0, the packet is discarded.
Source Address: is 128 bits long, which indicates the address
of the packet originator.
Destination Address: is 128 bits long, which indicates the
address of the packet recipient.

IPv6 extension header


An IPv4 packet header has an optional field (Options), which
includes security, timestamp, and record route options. The
variable length of the Options field makes the IPv4 packet
header length range from 20 bytes to 60 bytes. When routers
forward IPv4 packets with the Options field, many resources
need to be used. Therefore, these IPv4 packets are rarely
used in practice.
IPv6 uses extension headers to replace the Options field in the
IPv4 header. Extension headers are placed between the IPv6
basic header and upper-layer PDU. An IPv6 packet may carry
zero, one, or more extension headers. The sender of a packet
adds one or more extension headers to the packet only when
the sender requests other routers or the destination device to
perform special handling. Unlike IPv4, IPv6 has variable-length
extension headers, which are not limited to 40 bytes. This
facilitates further extension. To improve extension header
processing efficiency and transport protocol performance, IPv6
requires that the extension header length be an integer
multiple of 8 bytes.
When multiple extension headers are used, the Next Header
field of an extension header indicates the type of the next
header following this extension header.

An IPv6 extension header contains the following fields:


Next Header: is 8 bits long. It is similar to the Next Header field
in the IPv6 basic header, indicating the type of the next
extension header (if existing) or the upper-layer protocol type.
Extension Header Len: is 8 bits long, which indicates the
extension header length excluding the Next Header field.
Extension Head Data: is of variable lengths. It includes a
series of options and the padding field.

Each extension header can only occur once in an IPv6 packet, except
for the Destination Options header. The Destination Options header
may occur at most twice (once before a Routing header and once
before the upper-layer header).

The Internet Control Message Protocol version 6 (ICMPv6) is one of the


basic IPv6 protocols.
In IPv4, ICMP reports IP packet forwarding information and
errors to the source node. ICMP defines certain messages
such as Destination Unreachable, Packet Too Big, Time
Exceeded, and Echo Request or Echo Reply to facilitate fault
diagnosis and information management. In addition to the
common functions provided by ICMPv4, ICMPv6 provides
mechanisms such as Neighbor Discovery (ID), stateless
address configuration including duplicate address detection,
and Path Maximum Transmission Unit (PMTU) discovery.
The protocol number of ICMPv6, namely, the value of the Next
Header field in an IPv6 packet is 58.
Some fields in the packet are described as follows:
Type: specifies the message type. Values 0 to 127
indicate the error message type, and values 128 to 255
indicate the informational message type.
Code: indicates a specific message type.
Checksum: indicates the checksum of an ICMPv6
packet.

Destination Unreachable message


When a data packet fails to be sent to the destination node
or the upper-layer protocol, the router or destination node
sends an ICMPv6 Destination Unreachable message to the
source node. In an ICMPv6 Destination Unreachable message,
the value of the Type field is 1. The value of the Code field can
be 0, 1, 2, 3, and 4. Each value has a specific meaning
(defined in RFC2463)
Code=0: No route to the destination device.
Code=1: Communication with the destination device is
administratively prohibited.
Code=2: Not assigned.
Code=3: Destination IP address is unreachable.
Code=4: Destination port is unreachable.
Packet Too Big message
If a data packet cannot be sent to the destination node
because the size of the packet exceeds the link MTU of the
outbound interface, the router sends an ICMPv6 Packet Too
Big message to the source node. The link MTU of the
outbound interface is carried in the message. PMTU discovery
is implemented based on Packet Too Big messages. In a
Packet Too Big message, the value of the Type field is 2 and
the value of the Code field is 0.

Time Exceeded message


If a router receives a packet with the hop limit being 0, it
discards the data packet and sends an ICMPv6 Time
Exceeded message to the source node. In a Time Exceeded
message, the value of the Type field is 3. The value of the
Code field can be 0 or 1.
Code=0: Hop limit exceeded in packet transmission
Code=1: Fragment reassembly timeout
Parameter Problem message
If an IPv6 node detects an error in the IPv6 packet header or
extension header, the IPv6 node discards the data packet and
sends an ICMPv6 Parameter Problem message to the source
node, specifying the location and type of the error. In a
Parameter Problem message, the value of the Type field is 4.
The value of the Code field can be 0, 1, or 2. The 32-bit Point
field indicates the location of the error. The Code field is
defined as follows:
Code=0: A field in the IPv6 basic header or extension
header is incorrect.
Code=1: The Next Header field in the IPv6 basic
header or extension header cannot be identified.
Code=2: Unknown options exist in the extension
header.

Echo Request messages


Echo Request messages are sent to destination nodes. After
receiving an Echo Request message, the destination node
responds with an Echo Reply message. In an Echo Request
message, the value of the Type field is 128 and the value of
the Code field is 0. The Identifier and Sequence Number fields
are configured by the source host to match the Echo Reply
messages and Echo Request messages.
Echo Reply messages
After receiving an Echo Request message, the destination
ICMPv6 node responds with an Echo Reply message. In an
Echo Reply message, the value of the Type field is 129 and
the value of the Code field is 0. The Identifier and Sequence
Number fields in the Echo Reply message are assigned the
same values as those in the Echo Request message.

IPv6 address resolution is completed at Layer 3. Layer 3 address


resolution brings the following advantages:
Layer 3 address resolution enables Layer 2 devices to use the
same address resolution protocol.
Layer 3 security mechanisms, for example, IPSec, are used to
prevent address resolution attacks.
Request packets are sent in multicast mode, reducing
performance requirements on Layer 2 networks.
Neighbor Solicitation (NS) packets and Neighbor Advertisement (NA)
packets are used during address resolution.
In an NS packet, the value of the Type field is 135 and the
value of the Code field is 0. An NS packet is similar to the ARP
Request packet in IPv4.
In an NA packet, the value of the Type field is 136 and the
value of the Code field is 0. An NA packet is similar to the ARP
Reply packet in IPv4.
The address resolution process is as follows:
PC1 needs to parse the link-layer address of PC2 before
sending packets to PC2. Therefore, PC1 sends an NS
message on the network.

In the NS message, the source IP address is the IPv6 address


of PC1, and the destination IP address is the multicast address
of PC2 (this multicast address is called a solicited-node
multicast address composed of the prefix FF02::1:FF00:0/104
and the last 24 bits of the corresponding unicast address).
The destination IP address to be parsed is the IPv6 address of
PC2. This indicates that PC1 wants to know the link-layer
address of PC2. The Options field in the NS message carries
the link-layer address of PC1.
After receiving the NS message,PC2 replies with an NA
message. In the NA reply message, the source address is the
IPv6 address of PC2, and the destination address is the IPv6
address of PC1 (the NS message is sent to PC1 in unicast
mode using the link-layer address of PC1). The Options field
carries the link-layer address of PC2. This is the whole
address resolution process.

An IPv6 unicast address that is assigned to an interface but has not


been verified by DAD is called a tentative address. An interface cannot
use the tentative address for unicast communication but will join two
multicast groups: ALL-nodes multicast group and Solicited-node
multicast group.
IPv6 DAD is similar to IPv4 gratuitous ARP. A node sends an NS
message that requests the tentative address as the destination address
to the Solicited-node multicast group. If the node receives an NA Reply
message, the tentative address is being used by another node. This
node will not use this tentative address for communication.
DAD process
An IPv6 address 2000::1 is assigned to PC1 as a tentative
IPv6 address. To check the validity of 2000::1, PC1 sends an
NS message to the Solicited-node multicast group to which
2000::1 belongs. The NS message contains the requested
address 2000::1. Since 2000::1 is not specified, the source
address of the NS message is an unspecified address. After
receiving the NS message, PC2 processes the message in the
following ways:
If 2000::1 is one tentative address of PC2, PC2 will not
use this address as an interface address and not send
the NA message.

If 2000::1 is being used on PC2, PC2 sends an NA


message to the All-nodes multicast group to which the
address belongs. The NA message carries IP address
2000::1. In this way, PC1 can find that the tentative
address is duplicate after receiving the message and
will not use the address.

IPv6 supports stateless address autoconfiguration. Hosts obtain IPv6


prefixes and automatically generate interface IDs. Router Discovery is
the basics for IPv6 address autoconfiguration and is implemented
through the following two messages:
Router Advertisement (RA) message: Each router periodically
sends multicast RA messages that carry network prefixes and
identifiers on the network to declare its existence to Layer 2
hosts and routers. An RA message has a value of 134 in the
Type field.
Router Solicitation (RS) message: After being connected to the
network, a host immediately sends an RS message to obtain
network prefixes. Routers on the network reply with an RA
message. An RS message has a value of 133 in the Type field.
Address autoconfiguration
The process of IPv6 stateless autoconfiguration is as follows:
A host automatically configures the link-local address
based on the interface ID.
The host sends an NS message for duplicate address
detection.
If address conflict occurs, the host stops address
autoconfiguration. Then, the host address needs to be
configured manually.

If addresses do not conflict, the link-local address takes


effect. The host is connected to the network and can
communicate with the local node.
The host sends an RS message or receives RA
messages routers periodically send.
The host obtains the IPv6 address based on the prefix
carried in the RA message and the interface ID
generated in EUI-64 format.

To choose an optimal gateway router, the gateway router sends a


Redirection message to notify the sender that packets can be sent from
another gateway router. A Redirection message is contained in an
ICMPv6 message. A Redirection message has the value of 137 in the
Type field and carries a better next hop address and destination
address of packets that need to be redirected.
The process of redirecting packets is as follows:
PC1 needs to communicate with PC2. By default, packets sent
from PC1 to PC2 are sent through R1. After receiving packets
from PC1, R1 finds that sending packets to R2 is much better.
R1 sends a Redirection message to PC1 to notify PC1 that R2
is a better next hop address. The destination address of PC2
is carried in the ICMPv6 Redirection message. After receiving
the Redirection message, PC1 adds a host route to the default
routing table. Packets sent to PC2 will be directly sent to R2.
A router sends a Redirection message in the following situations:
The destination address of the packet is not a multicast
address.
Packets are not routed to the router.
After route calculation, the outbound interface of the next hop
is the interface that receives the packets.

The router finds that a better next hop IP address of the packet
is on the same network segment as the source IP address of
the packet.
After checking the source address of the packet, the router
finds a neighboring device in the neighbor entries that uses
this address as the global unicast address or the link-local
address.

In IPv6, packets are fragmented on the source node to reduce the


pressure on the transit device.
The PMTU protocol is implemented through ICMPv6 Packet Too Big
messages. A source node first uses the MTU of its outbound interface
as the PMTU and sends a probe packet. If a smaller PMTU exists on
the transmission path, the transit device sends a Packet Too Big
message to the source node. The Packet Too Big message contains
the MTU value of the outbound interface on the transit device. After
receiving the message, the source node changes the PMTU value to
the received MTU value and sends packets based on the new MTU.
This process is repeated until packets are sent to the destination
address. Then, the source node obtains the PMTU of the destination
address.
The process of PMTU discovery.
Packets are transmitted through four links. The MTU values of
the four links are 1500, 1500, 1400, and 1300 bytes
respectively. Before sending a packet, the source node
fragments the packet based on PMTU 1500. When the packet
is sent to the outbound interface with MTU 1400, the router
returns a Packet Too Big message that carries MTU 1400.
After receiving the message, the source node fragments the
packet based on MTU 1400 and sends the fragmented packet
again.

When the packet is sent to the outbound interface with MTU


1300, the router returns another Packet Too Big message that
carries MTU 1300. The source node receives the message
and fragments the packet based on MTU 1300. In this way, the
source node sends the packet to the destination address and
discovers the PMTU of the transmission path.

RIPng made the following modifications to RIP:


RIPng uses UDP port 521 (RIP uses UDP port 520) to send
and receive routing information.
RIPng uses the destination addresses with 128-bit prefixes
(mask length).
RIPng uses 128-bit IPv6 addresses as next hop addresses.
RIPng uses the link-local address FE80::/10 as the source
address to send RIPng Update packets.
RIPng periodically sends routing information in multicast mode
and uses FF02::9 as the multicast address.
A RIPng packet consists of a header and multiple route table
entries (RTEs). In a RIPng packet, the maximum number of
RTEs depends on the MTU on the interface.

OSPFv3 is based on links rather than network segments.


OSPFv3 runs on IPv6, which is based on links rather than
network segments.
Therefore, you do not need to configure OSPFv3 on the
interfaces in the same network segment. It is only required that
the interfaces enabled with OSPFv3 are on the same link. In
addition, the interfaces can set up OSPFv3 sessions without
IPv6 global addresses.
OSPFv3 does not depend on IP addresses.
This is to separate topology calculation from IP addresses.
That is, OSPFv3 can calculate the OSPFv3 topology without
knowing the IPv6 global address, which only applies to virtual
link interfaces for packet forwarding.
OSPFv3 packets and LSA format change.
OSPFv3 packets do not contain IP addresses.
OSPFv3 router LSAs and network LSAs do not contain IP
addresses, which are advertised by Link LSAs and Intra Area
Prefix LSAs.
In OSPFv3, Router IDs, area IDs, and LSA link state IDs no
longer indicate IP addresses, but the IPv4 address format is
still reserved.
Neighbors are identified by Router IDs instead of IP addresses
in broadcast, NBMA, or P2MP networks.
Information about the flooding scope is added in LSAs of OSPFv3.

Information about the flooding scope is added in the LSA Type


field of LSAs of OSPFv3. Thus, OSPFv3 routers can process
LSAs of unidentified types, which makes the processing more
flexible.
OSPFv3 can store or flood unidentified packets,
whereas OSPFv2 just discards unidentified packets.
OSPFv3 floods packets in an OSPF area or on a link. It
sets the U flag bit of packets (the flooding area is
based on the link local) so that unidentified packets are
stored or forwarded to the stub area.
OSPFv3 supports multi-process on a link.
Only one OSPFv2 process can be configured on an OSPFv2
physical interface. In OSPFv3, one physical interface can be
configured with multiple processes that are identified by
different instance IDs.
OSPFv3 uses IPv6 link-local addresses.
As a routing protocol running on IPv6, OSPFv3 also uses linklocal addresses to maintain neighbor relationships and update
LSDBs. Except Vlink interfaces, all OSPFv3 interfaces use
link-local addresses as the source address and that of the next
hop to transmit OSPFv3 packets. The advantages are as
follows:
The OSPFv3 can calculate the topology without
knowing the global IPv6 addresses so that topology
calculation is independent of IP addresses.
The packets flooded on a link are not transmitted to
other links, which prevents unnecessary flooding and
saves bandwidth.
OSPFv3 packets do not contain authentication fields.
OSPFv3 directly adopts IPv6 authentication and security
measures. Thus, OSPFv3 does not need to perform
authentication. It only focuses on the processing of packets.
OSPFv3 supports two new LSAs.
Link LSA: A router floods a link LSA on the link where it
resides to advertise its link-local address and the configured
global IPv6 address.
Intra Area Prefix LSA: A router advertises an intra-area prefix
LSA in the local OSPF area to inform the other routers in the
area or the network, which can be a broadcast network or an
NBMA network, of its IPv6 global address.
OSPFv3 identifies neighbors based on router IDs only.
On broadcast, NBMA, and P2MP networks, OSPFv2 identifies
neighbors based on IPv4 addresses of interfaces.

OSPFv3 identifies neighbors based on router IDs only. Thus,


even if global IPv6 addresses are not configured or they are
configured in different network segments, OSPFv3 can still
establish and maintain neighbor relationships so that topology
calculation is not based on IP addresses.

Extended IS-IS for IPv6 is defined in the draft-ietf-isis-ipv6-05 of the


IETF. To process and calculate IPv6 routes, IS-IS uses two new TLVs
and one network layer protocol identifier (NLPID).
The two TLVs are as follows:
TLV 236 (IPv6 Reachability): describes network reachability by
defining the route prefix and metric.
TLV 232 (IPv6 Interface Address): is similar to the IP Interface
Address TLV of IPv4, except that it changes a 32-bit IPv4
address to a 128-bit IPv6 address.
The NLPID is an 8-bit field that identifies the protocol packets of the
network layer. The NLPID of IPv6 is 142 (0x8E). If IS-IS supports IPv6,
it advertises routing information through the NLPID value.

To support multiple network layer protocols, BGP requires NLRI and


Next_Hop attributes to carry information about network layer protocols.
Therefore, MP-BGP uses the following new optional non-transitive
attributes:
MP_REACH_NLRI: indicates the multiprotocol reachable NLRI.
It is used to advertise reachable routes and next hop
information.
MP_UNREACH_NLRI: indicates the multiprotocol unreachable
NLRI. It is used to withdraw unreachable routes.

Multicast Listener Discovery (MLD) is a protocol that manages IPv6


multicast members. It has similar principles and functions as IGMP.
MLD is used to enable each IPv6 router to discover their directed
connected multicast listeners (nodes that expect to receive multicast
data) and learn the multicast addresses that the neighbor nodes are
interested in. Then, MLD delivers the learnt information to the multicast
routing protocols used by the routers to ensure that multicast data can
be sent to all links where the receivers reside.

Querier election mechanism


The working mechanism is similar to IGMPv2:
Each MLD router considers itself as a querier when it
starts and sends a General Query message with
destination address FF02::1 to all hosts and routers on
the local network segment.
When the routers receive a General Query message,
they compare the source IPv6 address of the message
with their own interface IPv6 address. The router with
the smallest IPv6 address becomes the querier, and
the other routers are considered non-queriers.
All non-queriers start a timer (Other Querier Present
Timer). If non-queriers receive a Query message from
the querier before the timer expires, they reset the
timer. If non-queriers receive no Query message from
the querier when the timer expires, they trigger election
of a new querier.
Member join mechanism
PC2 and PC3 need to receive IPv6 multicast data destined for
IPv6 multicast group G1, and PC1 needs to receive IPv6
multicast data destined for IPv6 multicast group G2. The hosts
need to join their respective multicast groups, and then the
MLD querier (R1) needs to maintain IPv6 group memberships.

The query and report process is as follows:


Hosts send Multicast Listener Report messages to the
IPv6 multicast groups that they want to join without
waiting to receive a Query message from the MLD
querier.
The MLD querier (R1) periodically multicasts General
Query messages with destination address FF02::1 to
all hosts and routers on the local network segment.
After PC2 and PC3 receive the Query message, the
host whose delay timer expires first sends a Report
message to G1. If the delay timer of PC2 expires first,
PC2 multicasts a Report message to G1, declaring that
it belongs to G1. All hosts on the local network
segment can receive the Report message sent from
PC2 to G1. When PC3 receives this Report message, it
does not send the same Report message to G1
because MLD routers (R1 and R2) have known that G1
has members on the local network segment. This
mechanism suppresses duplicate Report messages,
reducing information traffic on the local network
segment.
PC1 still needs to multicast a Report message to G2,
declaring that it belongs to G2.
After receiving the Report messages, MLD routers
know that multicast groups G1 and G2 have members
on the local network segment. Then the routers use
IPv6 multicast routing protocols (such as IPv6 PIM) to
create (*, G1) and (*, G2) entries for multicast data
forwarding, in which * stands for any multicast source.
When IPv6 multicast data sent from an IPv6 multicast
source reaches the MLD routers through multicast
routes, the MLD routers forward the received multicast
data to the local network segment because they have
(*, G1) and (*, G2) entries. Subsequently, receiver
hosts can receive the IPv6 multicast data.
Member Leave Mechanism
The host sends a Done message with destination
address FF02::2 to all IPv6 multicast routers on the
local network segment.

When the MLD querier receives the Done message, it


sends a Multicast-Address-Specific Query message to
the IPv6 multicast group that the host wants to leave.
The destination address and group address of the
Query message are the address of this IPv6 multicast
group.
If the IPv6 multicast group has other members on the
network segment, the members send a Report
message within the maximum response time.
If the querier receives the Report messages from other
members within the maximum response time, the
querier continues to maintain memberships of the IPv6
multicast group. Otherwise, the querier considers that
the IPv6 multicast group has no member on the local
network segment and stops maintaining memberships
of the IPv6 multicast group.

IPv6 multicast source filtering


MLDv2 supports IPv6 multicast source filtering and defines two
filter modes: INCLUDE and EXCLUDE. When a host joins an
IPv6 multicast group G, the host can choose to accept or reject
IPv6 multicast data from a specific source S. When a host
joins an IPv6 multicast group:
If the host only needs to receive data sent from sources
S1, S2, and so on, the host can send a Report
message with an INCLUDE Sources (S1, S2,) entry.
If the host wants to reject data sent from sources S1,
S2, and so on, the host can send a Report message
with an EXCLUDE Sources (S1, S2,) entry.
IPv6 Multicast Group Status Tracking
Multicast routers running MLDv2 keep IPv6 multicast group
state based on per multicast address per attached link. The
IPv6 multicast group state includes:
Filter mode: The MLD querier tracks the INCLUDE or
EXCLUDE state.
Source list: The MLD querier tracks the sources that
are added or deleted.
Timers: include a filter timer when the MLD querier
switches to the INCLUDE mode after its IPv6 multicast
address expires and a source timer about source
records.

Receiver Host Status Listening


Multicast routers running MLDv2 listen to the receiver host
status to record and maintain information about hosts that join
IPv6 multicast groups on the local network segment.

IPv4/IPv6 dual stack is an efficient technology that implements IPv4-toIPv6 transition. In IPv4/IPv6 dual stack, network devices support both
the IPv4 protocol stack and IPv6 protocol stack. The source device
selects a protocol stack according to the IP address of the destination
device. Network devices between the source and destination devices
select a protocol stack to process and forward packets according to the
packet protocol type. IPv4/IPv6 dual stack can be implemented on a
single device or on a dual-stack backbone network. On a dual-stack
backbone network, all devices must support the IPv4/IPv6 dual stack,
and interfaces connected to the dual-stack network must have both
IPv4 and IPv6 addresses configured.
The topology is described as follows:
The host sends a DNS request to the DNS server for the IP
address of domain name www.huawei.com. The DNS server
replies with the requested IP address of the domain name. The
IP address may be 10.1.1.1 or 3ffe:yyyy::1. If the host sends a
class-A query, the DNS server replies with the IPv4 address of
the domain name. When the host sends a class-AAAA query,
the DNS server replies with the IPv6 address of the domain
name.

The R1 in the figure supports IPv4/IPv6 dual stack. If the host


needs to access network server at IPv4 address 10.1.1.1, the
host can access the network server through the IPv4 protocol
stack of R1.If the host needs to access the network server at
IPv6 address 3ffe:yyyy::1, the host can access the network
server through the IPv6 protocol stack of R1.

During early transition, IPv4 networks are widely deployed, while IPv6
networks are isolated islands. IPv6 over IPv4 tunneling allows IPv6
packets to be transmitted on an IPv4 network, interconnecting all IPv6
islands.

Principles are as follows:


IPv4/IPv6 dual stack is enabled and an IPv6 over IPv4 tunnel
is deployed on edge routing devices.
After an edge routing device receives a packet from the IPv6
network, the device appends an IPv4 header to the IPv6
packet to encapsulate the IPv6 packet as an IPv4 packet if the
destination address of the packet is not the device and the
outbound interface of the packet is a tunnel interface.
On the IPv4 network, the encapsulated packet is transmitted to
the remote edge routing device.
The remote edge routing device decapsulates the packet,
removes the IPv4 header, and then sends the decapsulated
IPv6 packet to the connected IPv6 network.
The IPv4 address of the source end of an IPv6 over IPv4 tunnel must
be manually configured, but the IPv4 address of the destination end
can be manually configured or automatically obtained. An IPv6 over
IPv4 tunnel can be a manual or an automatic tunnel depending on how
the destination end of the tunnel obtains its IPv4 address.

Manual tunnel: The edge routing device cannot automatically


obtain the IPv4 address of the destination end, which must be
manually configured so that the packets can be correctly
forwarded to the tunnel end.
Automatic tunnel: The edge routing device can automatically
obtain the IPv4 address of the destination end and does not
require you to manually configure an IPv4 address for the
destination end. In most cases, two interfaces on both ends of
an automatic tunnel use IPv6 addresses that contain
embedded IPv4 addresses so that the destination IPv4
address can be extracted from the destination IPv6 address of
IPv6 packets.

If an edge routing device needs to set up a manual tunnel with multiple


devices, multiple tunnels must be configured on the edge routing
device. Such configuration is complex. To simplify the configuration, a
manual tunnel is often set up between two edge routing devices to
connect two IPv6 networks.
The manual tunnel has advantages and disadvantages:
Advantage: applies to any environment in which IPv6
traverses IPv4.
Disadvantage: must be manually configured.
Packets are transmitted in an IPv6 over IPv4 manual tunnel as follows:
When an edge device of the tunnel receives an IPv6 packet
from an IPv6 network, the device searches for the IPv6 routing
table according to the destination address of the IPv6 packet.
If the packet is forwarded from the virtual tunnel interface, the
device encapsulates the packet according to the tunnel source
and destination IPv4 addresses configured for the tunnel
interface. The encapsulated packet becomes an IPv4 packet,
which is then processed by the IPv4 protocol stack. The IPv4
packet is forwarded to the destination end of the tunnel over
an IPv4 network. After the destination end of the tunnel
receives the IPv4 packet, it decapsulates the packet and
sends the decapsulated packet to the IPv6 protocol stack.

An IPv6 over IPv4 GRE tunnel uses standard GRE tunneling


technology to provide a point-to-point connection and requires tunnel
endpoint addresses to be manually configured. GRE tunnels have no
limitations on the encapsulation protocol and transport protocol, which
can be any protocol such as IPv4, IPv6, OSI, or Multiprotocol Label
Switching (MPLS).
Packet forwarding on an IPv6 over IPv4 GRE tunnel is similar to that on
an IPv6 over IPv4 manual tunnel.

The destination address of IPv6 packets transmitted over an automatic


IPv4-compatible IPv6 tunnel is an IPv4-compatible IPv6 address (the
special address used by the automatic tunnel). An IPv4-compatible
IPv6 address is an IPv6 unicast address that has zeros in the highorder 96 bits and an IPv4 address in the low-order 32 bits.
Disadvantages of an automatic IPv4-compatible IPv6 tunnel:
An automatic IPv4-compatible IPv6 tunnel requires that each
host on both ends should have a valid IP address and support
IPv4/IPv6 dual stack and automatic IPv4-compatible IPv6
tunnels. Therefore, automatic IPv4-compatible IPv6 tunnels
cannot be deployed in a large scale. Currently, automatic IPv4compatible IPv6 tunnels have been replaced by automatic
6to4 tunnels.
Packet forwarding process is as follows:
After R1 receives an IPv6 packet destined for R2, R1 searches
for an IPv6 route according to destination address ::2.1.1.1,
and finds that the next hop is a tunnel interface. The tunnel
configured on R1 is an automatic IPv4-compatible IPv6 tunnel.
Therefore, R1 encapsulates the IPv6 packet into an IPv4
packet. In the IPv4 packet, the source address is the tunnel
source address 1.1.1.1, and the destination address is the loworder 32 bits of IPv4-compatible IPv6 address ::2.1.1.1,
namely, 2.1.1.1. The IPv4 packet is forwarded by the tunnel
interface on R1 over the IPv4 network to R2 at 2.1.1.1.

After R2 receives the IPv4 packet, it decapsulates the IPv4


packet to obtain the IPv6 packet and sends the IPv6 packet to
the IPv6 protocol stack for processing. An IPv6 packet is sent
from R2 to R1 following a similar process.

An automatic 6to4 tunnel is also a kind of automatic tunnel and set up


using the IPv4 address embedded in an IPv6 address. Unlike an
automatic IPv4-compatible IPv6 tunnel, the 6to4 automatic tunnel can
be set up from a router to a router, from a host to a router, from a router
to a host, and from a host to a host.
The address format is as follows:
FP: is the format prefix of aggregatable global unicast
addresses and fixed as 001.
TLA: is short for top level aggregator and fixed as 0x0002.
SLA: is short for site level aggregator.
A 6to4 address starts with the prefix 2002::/16 in the format of
2002:IPv4-address::/48. A 6to4 address has a 64-bit network prefix, in
which the first 48 bits (2002:a.b.c.d) are the IPv4 address assigned to a
router interface and cannot be changed, and the last 16 bits (SLA) can
be configured by the user.

An IPv4 address can only be used as the source address of one 6to4
tunnel. If one edge router connects to multiple 6to4 networks and uses
the same IPv4 address as the tunnel source address, SLA IDs in 6to4
addresses are used to differentiate the 6to4 networks. These 6to4
networks, however, share the same 6to4 tunnel.

Common IPv6 networks need to communicate with 6to4 networks over


IPv4 networks. This requirement can be met through 6to4 relays. A
6to4 relay is a next-hop device that forwards IPv6 packets of which the
destination address is not a 6to4 address but the next-hop address is a
6to4 address. The tunnel destination IPv4 address is obtained from the
next-hop 6to4 address.
If a host on 6to4 network 2 needs to communicate with devices on the
IPv6 network, a route must be configured on the edge router, and the
next-hop address of the route to the IPv6 network is specified as the
6to4 address of the 6to4 relay. The 6to4 address of the relay matches
the source address of the 6to4 tunnel. Packets to be sent from 6to4
network 2 to the IPv6 network are first sent to the 6to4 relay according
to the next hop specified in the routing table. The 6to4 relay then
forwards the packet to the IPv6 network. When a packet needs to be
sent from the IPv6 network to 6to4 network , the 6to2 relay
encapsulates the packet as an IPv4 packet according to the destination
address (a 6to4 address) of the packet so that the packet can be
successfully sent to 6to4 network.

Intra-Site Automatic Tunnel Addressing Protocol (ISATAP) is another


automatic tunneling mechanism. An ISATAP tunnel uses an IPv6
address with an embedded IPv4 address. An ISATAP address uses an
IPv4 address as the interface identifier, while a 6to4 address uses an
IPv4 address as the network prefix.
The address is described as follows:
If the IPv4 address is globally unique, the u bit is 1. Otherwise,
the u bit is 0. The g bit indicates whether the IPv4 address is
unicast or multicast. An ISATAP address can be a global
unicast address, link-local address, unique local address, or
multicast address. The first 64 bits of an ISATAP address are
obtained through a request sent to an ISATAP router and can
be automatically configured. The Neighbor Discovery (ND)
protocol can run between edge devices on both ends of an
ISATAP tunnel. An ISATAP tunnel regards an IPv4 network as
a non-broadcast multi-access (NBMA) network.
The forwarding process is described as follows:
The IPv4 network has two dual-stack hosts PC2 and PC3,
each of which has a private IPv4 address. To implement the
ISATAP function, perform the following operations:
Configure ISATAP tunnel interfaces. The hosts
generate ISATAP interface IDs according to their IPv4
addresses.

The hosts then generate a link-local IPv6 address


according to the ISATAP interface identifier. Then the
two hosts have IPv6 communication capabilities on the
local link.
The hosts perform address autoconfiguration and
obtain IPv6 global unicast addresses and ULA
addresses.
The host obtains an IPv4 address from the next hop
IPv6 address as the destination address, and forwards
packets through the tunnel interface to communicate
with another IPv6 host. If the destination host is within
the local site, the next hop is the destination host. If the
destination host is in a different site, the next hop
address is the address of the ISATAP router.

During a later stage of IPv4-to-IPv6 transition, IPv6 networks are widely


deployed, while IPv4 networks are isolated islands over the world. You
can create a tunnel on an IPv6 network to connect isolated IPv4 sites
so that isolated IPv4 sites can access other IPv4 networks through the
IPv6 public network.
The forwarding process is described as follows:
IPv4/IPv6 dual stack is enabled and an IPv4 over IPv6 tunnel
is deployed on edge routing devices.
After the edge routing device receives a packet from the
connected IPv4 network, it adds an IPv6 header to the IPv4
packet to encapsulate the IPv4 packet as an IPv6 packet if the
destination address of the packet is not the routing device.
On the IPv6 network, the encapsulated packet is transmitted to
the remote edge routing device.
The remote edge routing device decapsulates the packet,
removes the IPv6 header, and then sends the decapsulated
IPv4 packet to the connected IPv4 network.

Example description:
The device addresses are determined as follows:
If RTX connects to RTY, the addresses of the two
devices are 2001:XY::X/64 and 2001:XY::Y/64
respectively.

The commands and their functions are as follows:


ripng: creates an RIPng process.
ripng enable: enable RIPng on an interface.
ripng metricout: sets the metric that is added to the RIPng
route sent by an interface.
import-route: configures RIPng to import routes from other
routing protocols. You can use the route-policy parameter to
filter routes to be imported and configure route properties.
Precautions:
The policy usage is similar to that in IPv4.

Example description:
The device addresses are determined as follows:
If RTX connects to RTY, the addresses of the two
devices are 2001:XY::X/64 and 2001:XY::Y/64
respectively.

The commands and their functions are as follows:


router-id: configures the ID of the router running OSPFv3.
ospfv3 area: enables the OSPFv3 process on an interface
and specifies the area the process belongs to.
nssa: configures an OSPFv3 area as an NSSA.
undo ipv6 nd ra halt: enables the system to send RA packets.
ipv6 address auto global: enables a device to automatically
generate a global IPv6 address through stateless
autoconfiguration.
Precautions:
OSPFv3 has similar features as OSPFv2.

Example description:
The device addresses are determined as follows:
If RTX connects to RTY, the addresses of the two
devices are 2001:XY::X/64 and 2001:XY::Y/64
respectively.

The commands and their functions are as follows:


ipv6 enable: enables the IPv6 capability of an IS-IS process.
ipv6 nd ra prefix: configures the prefix in an RA packet.
isis ipv6 enable: enables the IS-IS IPv6 capability for an
interface and specifies the ID of the IS-IS process to be
associated with the interface.
ipv6 import-route isis level-2 into level-1: configures IPv6
route importing from Level-2 areas to Level-1 areas.
Precautions:
IS-IS IPv6 has similar features as IS-IS IPv4.

Example description:
The device addresses are determined as follows:
If RTX connects to RTY, the addresses of the two
devices are 2001:XY::X/64 and 2001:XY::Y/64
respectively.

The commands and their functions are as follows:


peer{ipv6-address | group-name } as-number as-number:
creates a peer or configures an AS number for a specified
peer group.
ipv6-family: displays the IPv6 address family view of BGP.
peer enable: enables a BGP device to exchange routes with a
specified peer or peer group in the address family view.
peer connect-interface: specifies a source interface from
which BGP packets are sent, and a source address used for
initiating a connection.
peer password: enables a BGP device to implement MD5
authentication for BGP messages exchanged during the
establishment of a TCP connection with a peer.
Precautions:
BGP4+ has similar features as BGP.

Example description:
IPv6 and IPv4 addresses have been specified.

The commands and their functions are as follows:


interface tunnel: creates a tunnel interface and displays the
tunnel interface view.
tunnel-protocol ipv6-ipv4: sets the tunnel mode to IPv6 over
IPv4 manual tunnel.
source { ipv4-address | interface-type interface-number }:
specifies the source interface of a tunnel.
destination { ipv4-address }: specifies the destination
interface of a tunnel.
ipv6 address { ipv6-address prefix-length }: configures IPv6
addresses for tunnel interfaces.

Example description:
IPv6 and IPv4 addresses have been specified.

The commands and their functions are as follows:


interface tunnel: creates a tunnel interface and displays the
tunnel interface view.
tunnel-protocol gre: sets the tunnel mode to IPv6 over IPv4
GRE tunnel.
source { ipv4-address | interface-type interface-number }:
specifies the source interface of the tunnel.
destination { ipv4-address }: specifies the destination
interface of a tunnel.
ipv6 address { ipv6-address prefix-length }: configures IPv6
addresses for tunnel interfaces.

MPLS VPN overview


A BGP/MPLS IP VPN is a Layer 3 Virtual Private Network
(L3VPN). It uses the Border Gateway Protocol (BGP) to
advertise VPN routes and uses Multiprotocol Label Switching
(MPLS) to forward VPN packets on the backbone network of
the Service Provider (SP). This technology is called IP VPN
because IP packets are transmitted on VPNs.
The BGP/MPLS IP VPN model consists of the following
entities:
Customer Edge (CE): a device that is deployed at the
edge of a customer network and has interfaces directly
connected to the SP network. A CE device can be a
router, switch, or host. Generally, CE devices cannot
detect VPNs and do not need to support MPLS.
Provider Edge (PE): a device that is deployed at the
edge of an SP network and directly connected to a CE
device. On an MPLS network, PE devices process all
VPN services and must have high performance.
Provider (P): a backbone device that is deployed on an
SP network and is not directly connected to CE devices.
P devices only need to provide basic MPLS forwarding
capabilities and do not maintain VPN information.
PE and P devices are managed by SPs. CE devices are
managed by customers unless customers authorize SPs to
manage their CE devices.

A PE device can connect to multiple CE devices. A CE device


can connect to multiple PE devices of the same SP or different
SPs.

Site

A site is a group of IP systems with IP connectivity, which can


be achieved independent of ISP networks.
Sites are configured based on topologies between devices but
not their geographic locations, although devices in a site are
geographically adjacent to each other in most situation.
The devices in a site may belong to multiple VPNs. That is, a
site may belong to more than multiple VPNs.

Different VPN sites can use overlapping address spaces.

A PE device establishes and maintains a VPN instance for each


directly connected site. A VPN instance contains VPN member
interfaces and routes of the corresponding site. Specifically, information
in a VPN instance includes the IP routing table, label forwarding table,
interface bound to the VPN instance, and VPN instance management
information. VPN instance management information includes the route
distinguisher (RD), route filtering policy, and member interface list of
the VPN instance.
A public routing and forwarding table and a VRF differ in the following
aspects:
A public routing table contains IPv4 routes of all the PE and P
devices. The routes are static routes or dynamic routes
generated by routing protocols on the backbone network.
A VPN routing table contains routes of all sites that belong to a
VPN instance. The routes are obtained through the exchange
of VPN routing information between PE devices or between
CE and PE devices.
Information in a public forwarding table is extracted from the
public routing table according to route management policies,
whereas information in a VPN forwarding table is extracted
from the corresponding VPN routing table.

VPN instances on a PE device are independent of each other


and maintain a VRF independent of the public routing and
forwarding table. Each VPN instance can be considered as a
virtual device, which maintains an independent address space
and connects to VPNs through interfaces.

The PE devices use Multiprotocol Extensions for BGP-4 (MP-BGP) to


advertise VPN routes and use the VPN-IPv4 address family to solve
the problem that BGP cannot distinguish VPN routes with the same IP
address prefix.
RDs distinguish the IPv4 prefixes with the same address space. The
RD format enables SPs to allocate RDs independently. When CE
devices are dual-homed to PE devices, RD must be globally unique to
ensure correct routing.

A VPN target, also called the route target (RT), is a 32-bit BGP
extension community attribute. BGP/MPLS IP VPN uses VPN targets to
control VPN routes advertisement.
A VPN instance is associated with one or more VPN target attributes.
VPN target attributes are classified into the following types:
Export target: After a PE device learns IPv4 routes from
directly connected sites, it converts the routes to VPN-IPv4
routes and sets the export target attribute for those routes. The
export target attribute is advertised with the routes as a BGP
extended community attribute.
Import target: After a PE device receives VPN-IPv4 routes
from other PE devices, it checks the export target attribute of
the routes. If the export target is the same as the import target
of a VPN instance on the local PE device, the local PE device
adds the route to the VPN routing table.
A VPN target defines which sites can receive a VPN route and which
VPN routes of which sites can be received by a PE device.
The reasons for using the VPN target instead of the RD as the
extended community attribute is as follows:

A VPN-IPv4 route has only one RD, but can be associated


with multiple VPN targets. With multiple extended community
attributes, BGP can greatly improve the flexibility and
expansibility of a network.
VPN targets can be used to control route advertisement
between different VPNs on a PE device. With properly
configured VPN targets, different VPN instances on a PE
device can import routes from each other.

Traditional BGP-4 defined in RFC 1771 can manage only the IPv4
routes but cannot process VPN routes that have overlapping address
spaces.
To correctly process VPN routes, VPNs use MP-BGP defined in RFC
2858 (Multiprotocol Extensions for BGP-4). MP-BGP supports multiple
network layer protocols. Network layer protocol information is contained
in the Network Layer Reachability Information (NLRI) field and the Next
Hop field of an MP-BGP Update message.
MP-BGP uses the address family to differentiate network layer
protocols. An address family can be a traditional IPv4 address family or
any other address family, such as a VPN-IPv4 address family or an
IPv6 address family. For the values of address families, see RFC 1700
(Assigned Numbers).

The PE and CE devices exchange routing information through standard


BGP, OSPF, IS-IS, RIP or static routes. During the process, the PE
device needs to store routes received from the CE devices to different
VRFs. Other operations are the same as those for common route
exchange. You can configure the same routing protocol for all the CE
devices. However, you must configure different instances for each VRF
of a PE device. The instances do not interfere with each other.

After a PE1 device receives an IPv4 route from a CE1 device, the PE
device adds the manually configured RD of the VRF to the route to
change the IPv4 route into a VPNv4 route. Then the PE device
changes the Next_Hop attribute in the Route Advertisement message
to its own Loopback address and adds a VPN label (randomly
generated by MP-IBGP) to the route. After that, the PE device adds the
Export Route Target attribute to the route and sends the route to all the
PE neighbors. In VRP5.3, after MPLS is enabled on PE1, PE1 uses
MP-BGP to allocate VPN labels to private network routes. PE devices
can then correctly exchange VPN routes.
When multiple CE devices in a VPN site connect to different PE
devices, VPN routes advertised from the CE devices to the PE devices
may be sent back to the VPN site after the routes traverse the
backbone network. This may cause routing loops in the VPN site. The
Site or Origin (SOO) specifies the source site and prevents routing
loops.

After PE2 receives a VPNv4 route advertised by PE1, PE2 converts the
VPNv4 route into an IPv4 route and adds the IPv4 route to the
corresponding VRF based on the import target attribute of the route.
The VPN label of the route is retained for packet forwarding. PE2
forwards the IPv4 route to the corresponding CE device through the
routing protocol between the PE and CE devices. The next hop in the
route is the IP address of PE2's interface.

The data to be exchanged to VPNs needs to be forwarded through the


MPLS backbone network based on MPLS labels. The process for
allocating public network labels (outer labels) is as follows:
The PE and P routers learn BGP next hop IP addresses using an IGP,
assign outer labels using LDP, and establish LSPs. A label stack is
used for packet forwarding. An outer label directs packets to the BGP
next hop. An inner label indicates the outbound interface for the packet
or the VPN instance to which the packet belongs. MPLS forwarding is
based on only outer labels and is irrelevant to the inner labels.

CE2 sends an IP packet destined for CE1. After receiving the packet,
PE2 encapsulates an inner label 15362 and then an outer label 1024 to
the packet and forwards the packet to the P device. After receiving the
packet, the penultimate hop P pops out the outer label, retains the inner
label, and forwards the packet to PE1 based on the outer label. PE1
determines the VPN site to which the packet belongs based on the
inner label, removes the inner label, and forwards the packet to CE1.

Case description
In this case, the addresses for interconnecting devices are as
follows:
If RTX interconnects with RTY, the addresses are
XY.1.1.X and XY.1.1.Y, network mask is 24.
Assume that PE1 is RT1, PE2 is RT2, P is RT3.

Command usage
ip binding vpn-instance: binds the current AC interface to a
specified VPN instance.
ipv4-family: enters the IPv4 address family view of BGP.

Precautions
After a VPN instance is bound to or unbound from an interface,
Layer 3 features such as IP address and routing protocol are
deleted from the interface. If such features are required, you
need to re-configure them.

Case description
In this case, the addresses for interconnecting devices are as
follows:
If RTX interconnects with RTY, the addresses are
XY.1.1.X and XY.1.1.Y, network mask is 24.
Assume that PE1 is RT1, PE2 is RT2, P is RT3.

Command usage
ip binding vpn-instance: binds the current AC interface to a
specified VPN instance.
ipv4-family: enters the IPv4 address family view of BGP.

Precautions
Specify a VPN instance for each RIP process on the PE
device.

Case description
In this case, the addresses for interconnecting devices are as
follows:
If RTX interconnects with RTY, the addresses are
XY.1.1.X and XY.1.1.Y, network mask is 24.
Assume that PE1 is RT1, PE2 is RT2, P is RT3.

Command usage
ip binding vpn-instance: binds the current AC interface to a
specified VPN instance.
ipv4-family: enters the IPv4 address family view of BGP.

Precautions
Specify a VPN instance for each IS-IS process on the PE
device.
Deleting a VPN instance or disabling a VPN instance IPv4
address family will delete all the IS-IS processes bound to the
VPN instance or the VPN instance IPv4 address family on the
PE.

Case description
In this case, the addresses for interconnecting devices are as
follows:
If RTX interconnects with RTY, the addresses are
XY.1.1.X and XY.1.1.Y, network mask is 24.
Assume that PE1 is RT1, PE2 is RT2, P is RT3.

Command usage
ip binding vpn-instance: binds the current AC interface to a
specified VPN instance.
ipv4-family: enters the IPv4 address family view of BGP.
Precautions
Specify a VPN instance for each OSPF process on the PE
device.
Deleting a VPN instance or disabling a VPN instance IPv4
address family will delete all the OSPF processes bound to the
VPN instance or the VPN instance IPv4 address family on the
PE.

Case description
In this case, the addresses for interconnecting devices are as
follows:
If RTX interconnects with RTY, the addresses are
XY.1.1.X and XY.1.1.Y, network mask is 24.
Assume that PE1 is RT1, PE2 is RT2, P is RT3.

Command usage
ip binding vpn-instance: binds the current AC interface to a
specified VPN instance.
peer substitute-as: replaces the AS number of the peer
specified in the AS_Path attribute with the local AS number.
Precautions
VPN sites in the same AS or with different private AS numbers
can communicate over the BGP MPLS/IP VPN backbone
network. Sites in the same VPNs have the same AS number.
When a local CE device establishes an EBGP neighbor
relationship with a PE device, you need to run the peer
substitute-as command to enable AS number substitution on
the PE device. If AS number substitution is disabled, the local
CE device discards VPN routes with the local AS number. As
a result, VPN users cannot communicate with each other.

To improve the HA of a device, increase MTBF and reduce MTTR.

Concepts
Two network devices establish a BFD session to detect the
bidirectional forwarding paths between them and serve upper-layer
applications. BFD does not provide the neighbor discovery mechanism.
Instead, BFD obtains neighbor information from the upper-layer
applications BFD serves. After the BFD session is established, the
local device periodically sends BFD packets. If the local device does
not receive a response from the peer device within the detection time, it
considers the forwarding path faulty. BFD then notifies the upper-layer
application for processing.
BFD control messages are encapsulated in UDP packets. The
destination port number is 3784 and source port number is a random
value from 49152 to 65535.
BFD session establishment process
OSPF discovers neighbors using the hello mechanism and sets up
connections to neighbors.
After setting up a neighbor relationship, OSPF notifies neighbor
information (including destination and source addresses) to BFD.
BFD sets up a session by using the received neighbor information.
After the BFD session is set up, BFD starts to detect link faults and
rapidly responds to link faults.
BFD session establishment process
A link fault is detected.

BFD detects the link fault and changes the BFD session status to
Down.
BFD notifies the local OSPF device that the BFD peer is unreachable.
Local OSPF process tears down the connection with the OSPF
neighbor.

The BFD sessions have the following status: Down, Init, Up, and Down.
Down: indicates that a BFD session is in the Down state or has just
been set up.
Init: indicates that the local system can communicate with the peer
system, and the local system expects to make the session Up.
Up: indicates that a session is established successfully.
AdminDown: indicates that a session is in the AdminDown state.
BFD session status transition:
R1 and R2 start BFD state machines respectively. The initial state of
BFD state machine is Down. R1 and R2 send BFD control messages
with the State field as Down.
After receiving the BFD message with the State field as Down from
R1, R2 switches the session status to Init and sends a BFD message
with State field as Init.
After the local BFD session status of R2 changes to Init, R2 no longer
processes the received BFD messages with the State field as Down.
The BFD session status change on R1 is the same as that on R2.
After receiving the BFD message with the State field as Init, R2
changes the local BFD session status to Up.
The BFD session status change on R1 is the same as that on R2.

Common Commands
Single-hop detection and multi-hop detection
Single-hop or multi-hop detection:
The bfd command enables the global BFD and
displays the BFD view.
The bfd bind peer-ip command creates a BFD
binding and establishes a BFD session.
The discriminator command sets the local and
remote discriminators for the current BFD
session.
The commit command submits the
configurations of a BFD session.
Association between BFD and interface status
The bfd command enables the global BFD and
displays the BFD view.
The bfd bind peer-ip default-ip command binds the
physical status of a physical link to the BFD session.
The discriminator command sets the local and remote
discriminators for the current BFD session.
The process-interface-status command associates
the status of the current BFD session with the status of
the interface to which the session is bound.
The configuration is similar to the configuration of BFD and route
association, and is omitted here.

When a router fails, neighbors at the routing protocol layer detect that
their neighbor relationships are Down and then become Up again after
a period of time. This is the flapping of neighbor relationships. The
flapping of neighbor relationships causes route flapping, which leads to
black hole routes on the restarted router or causes data services from
the neighbors to be transmitted bypass the restarted router. This
decreases the reliability on the network.
NSF is thus introduced to address route flapping issue. The following
requirements must be met:
Hardware: Dual control boards must be configured with redundant
RP. One is the active board and the other is the standby board. If the
active board restarts, the standby board becomes the active one. The
distributed structure is used. That is, data forwarding and control are
separated, and LPUs are responsible for data forwarding.
System software: When the active control board is running, it
synchronizes configuration and interface state information to the
standby control board. When an active/standby switchover occurs,
LPUs do not reset or withdraw forwarding entries, and the interfaces
remain Up.
Protocols: Graceful restart (GR) must be supported for related
network protocols, such as routing protocols OSPF, IS-IS, and BGP,
and other protocols such as Label Distribution Protocol (LDP) and
Resource Reservation Protocol (RSVP).

Graceful Restart (GR) is a mechanism that ensures nonstop service


data forwarding during an active/standby switchover or a protocol
restart. When a device is performing a protocol restart, it notifies
neighboring devices of its restart so that the neighboring relationships
and routes are stably maintained in a certain period. After the protocol
restart is complete, the neighboring devices synchronize configurations
(including the topologies, routes, and sessions maintained by the GRrelated protocols) to the GR Restarter. The configurations on the GR
Restarter are quickly restored. During the protocol restart, route
flapping will not occur and packet forwarding path is not changed. The
entire system continuously works.
OSPF GR terms:
GR Restarter: indicates the GR-capable device where protocol restart
occurs.
GR Helper: indicates a device neighboring with the GR Restarter and
helping complete the GR process.
GR Session: indicates the process of GR capability negotiation
performed during OSPF neighbor relationship establishment. The
negotiated content includes whether the two parties have the GR
capability. If the GR capability negotiation is successful, the GR
process starts when the protocol restart occurs.
Assume that R1 and R2 have a stable OSPF neighbor relationship and
GR capability is enabled on R1 and R2. When R1 restarts, the GR
process is as follows:

After R1 restarts, it sends a Grace LSA to R2.


When R2 receives the Grace LSA sent by R1, it maintains the
neighbor relationship with R1.
R1 and R2 exchange hello and DD packets and synchronize LSDB to
each other. LSAs are not generated during GR; therefore, if R1
receives its own LSAs from R2 during LSDB synchronization, it stores
them and adds the Stable tag.
After LSDB synchronization is complete, R1 sends Grace LSA to
notify R2 that the GR is finished. R1 starts the OSPF process and
regenerates LSAs, and then deletes the LSAs that are tagged Stable
and not regenerated.
After restoring all routing entries, R1 starts to recalculate routes and
updates the FIB table.
OSPF GR commands:
The opaque-capability enable command enables the Opaque-LSA
capability. After Opaque-LSA capability is enabled, an OSPF process
can generate Opaque-LSAs and receive Opaque-LSAs from
neighboring devices.
The graceful-restart command enables OSPF GR.

IS-IS GR also uses the concepts of GR Restarter, GR Helper, and GR


Session, which are the same as that used in OSPF GR.
To support the GR feature, IS-IS adds the Restart TLV field to hello
packets and defines three timers.
T1 timer is similar to the IIH timer used in the IS-IS protocol. When a
device restarts, it creates a T1 timer on each interface and periodically
sends hello packets. The T1 timer on an interface is deleted only when
the interface receives all hello ACK packets and CSNP packets.
T2 defines the timeout period of LSDB synchronization after a device
restarts. The T2 timer of a Level is deleted only when the LSDB of this
Level completes synchronization. If LSDB synchronization is not
complete when the T2 timer expires, the T2 timer is deleted and GR
fails.
T3 defines the maximum time during which the GR Restarter
performs GR. If LSDB synchronization is not complete when the T3
timer expires, the T3 timer is deleted and GR fails.
Assume that R1 and R2 have a stable IS-IS neighbor relationship and
GR capability is enabled on R1 and R2. When R1 restarts, the GR
process is as follows:
T2 and T3 timers start when the IS-IS protocol on R1 is globally
enabled again. When the interface of R1 goes Up again and enables
the IS-IS protocol, the T1 timer starts on the interface and the interface
sends a hello packet.

When R2 receives the hello packet from R1, it maintains the neighbor
relationship with R1 and sends a hello packet. Then R2 sends a CSNP
packet and an LSP packet to R1 to help LSDB synchronization.
When the interface of R1 receives the hello packet and all CSNP
packets, R1 deletes the T1 timer; otherwise, R1 periodically sends hello
packets until it receives all hello packets and CSNP packets. If the
number of times the T1 timer expires reaches the maximum value, the
T1 timer is also deleted.
When the LSDB synchronization is complete, R1 deletes the T2 timer.
After all T2 timers are deleted, R1 starts to delete T3 timers. When
the GR is complete, R1 starts the IS-IS process. IIH timer is started on
all interfaces, and then R1 can periodically send hello packets.
After restoring all routing entries, R1 starts to recalculate routes and
updates the FIB table.
IS-IS GR command:
The graceful-restart command enables IS-IS GR.

LAND attack
Because of the vulnerability in the 3-way handshake mechanism of
TCP, a LAND attacker sends SYN packets of which the source address
and port of a device are the same as the destination address and port
respectively. After receiving the SYN packet, the target host creates a
null TCP connection with the source and destination addresses as the
address of the target host. The connection is kept until expiration. The
target host will create many null TCP connections, wasting resources or
causing device breakdown.
After defense against malformed packet attacks is enabled, the
device checks source and destination addresses in TCP SYN packets
to prevent LAND attacks. The device considers TCP SYN packets with
the same source and destination addresses as malformed packets and
discards them.
Commands for configuring defense against malformed packet attacks
The anti-attack abnormal enable command configures defense
against malformed packets. After the command is executed, the device
discards malformed packets.

TCP SYN attack


The TCP SYN attack takes advantage of the vulnerability in 3-way
handshake of TCP. During the 3-way handshakes of TCP, when
receiving the initial SYN message from the client, the server sends
back an SYN+ACK packet. When the server is waiting for the final ACK
packet from the client, the connection stays in half-connected mode. If
the server fails to receive the ACK packet, it resends a SYN+ACK
packet to the client. If the server still cannot receive ACK packets, the
server closes the connection and updates the session status in memory.
The interval from the sending of initial SYN+ACK packet to connection
closing is about 30 seconds.
During this interval, the attacker may send more than 100 thousands
of SYN packets to the open interfaces and does not respond to the
SYN+ACK packets from the server. Then, memory of the server is
overloaded and cannot accept new connection requests. As a result,
the server closes all active connections.
After defense against TCP SYN flood attacks is enabled, the device
limits the rate of TCP SYN packets so that system resources will not be
exhausted by attacks.

Commands for configuring defense against malformed packet attacks


The anti-attack udp-flood enable command enables the TCP SYN
Flood attack defense.
The anti-attack tcp-syn car command configures the rate limit for
TCP SYN packets. If the rate of received TCP SYN flood packets
exceeds the limit, the device discards excess packets to ensure normal
working of CPU.

Two modes of URPF:


Strict mode
In this mode, packets can pass the check only when
the forwarding table contains the related entries and
the interface of the default route matches the inbound
interface of the packet.
If route symmetry is ensured, you are advised to use
the URPF strict check. For example, if there is only one
path between two network edge devices, URPF strict
check can be used to ensure network security.
Loose mode
In this mode, packets pass the check as long as the
source IP addresses of the packets match the entries
in the routing table.
If route symmetry is not ensured, you are advised to
use the URPF loose check. For example, if there are
multiple paths between two network edge devices,
URPF loose check can be used to ensure network
security.
Topology description
A bogus packet with source IP address 2.1.1.1 is sent by the attacker
to S1. After receiving the bogus packet, S1 sends a response packet to
the destination device at 2.1.1.1. In this situation, both S1 and PC1 are
attacked by the bogus packets. If URPF is enabled on S1, when S1
receives the bogus packet with source IP address 2.1.1.1, URPF
discards the packet because the interface corresponding to the source
address of the packet does not match the interface receiving the packet.

URPF command
The urpf command enables URPF on an interface and set the URPF
mode.

IPSG principles
IPSG matches IP packets against static or dynamic DHCP binding
table. Before a network device forwards an IP packet, it compares the
source IP address, source MAC address, interface, and VLAN
information in the IP packet with entries in the binding table. If a
matching entry is found, the device considers the IP packet valid and
forwards it. Otherwise, the device considers the IP packet as an attack
packet and discards it.
Working process
After IPSG is configured on S1, S1 checks the incoming IP packets
against the binding table. When the packet information matches the
binding table, the packets are forwarded; otherwise, the packets are
discarded.
IPSG commands
The binding table can be generated through DHCP or manually
configured through static IP addresses (the user-bind static command
is used to configure static table).
The ip source check user-bind enable command enables the IPSG
function on an interface to check the received IP packets.
The ip source check user-bind check-item command configures
VLAN- or interface-based IP packet check items. This command is only
valid to dynamic binding table.

Topology description
The figure shows a scenario of the MITM attack. The attacker sends a
bogus ARP packet using the PC3's address as the source address to
PC1. PC1 records incorrect address mapping relationship of PC3 in the
ARP table. The attacker thus obtains the data sent by PC1 to PC3 and
sent by PC3 to PC1. Therefore, information between PC1 and PC3
leaks.
To prevent MITM attacks, configure DAI on S1.
When an attacker connects to S1 and attempts to send bogus ARP
packet to S1, S1 detects the attack behavior according to the DHCP
snooping binding table and discards the ARP packet. If the ARP
discarding alarm is enabled on S1, when the number of discarded ARP
packets exceeds the alarm threshold, S1 sends an alarm to notify the
administrator.
DAI uses DHCP snooping binding table to defend against MITM attacks.
Before a device forwards an ARP packet, it compares the source IP
address, source MAC address, interface, and VLAN information in the
ARP packet with entries in the binding table. If an entry is matched, the
device considers the packet valid and forwards it; otherwise, the device
considers the packet as an attack packet and discards it.
DAI command
The arp anti-attack check user-bind enable command enables DAI
on an interface or in a VLAN. That is, the device checks ARP packets
against the binding table.

QoS provides differentiated service qualities for different applications,


for example, dedicated bandwidth, decreased packet loss ratio, short
packet transmission delay, and decreased delay and jitter.
Best-effort service model
Routers and switches are packet switching devices. They
select transmission path for each packet based on TCP/IP and
use the statistics multiplexing method, but do not use the
dedicated connections like TDM. Traditionally, IP provides only
one service model (Best-Effort). In this model, all packets
transmitted on a network have the same priority. Best-Effort
means that the IP network tries best to transmit all packets to
the correct destination addresses completely and ensure that
the packets are not discarded, damaged, repeated, or loss of
sequence during transmission. However, the Best-Effort model
does not guarantee any transmission indicators, such as delay
and jitter.
Best-Effort is not belongs to the QOS technical in strict, but is
the major service model used by today's Internet. So we need
know about it.
Due to the Best-Effort model, the Internet has made a lot of
achievements. However, with the development of the Internet,
the Best-Effort model cannot meet increasing requirements of
emerging applications. Therefore, the SPs have to provide
more types of service based on the Best-Effort model, to meet
requirements of each application.

IntServ model
The IntServ model, developed by IETF in 1993, supports
various types of service on IP networks. It provides both realtime service and best-effort service on IP networks. The
IntServ model reserves resources for each information flow.
The source and destination hosts exchange RSVP messages
to establish packet categories and forwarding status on each
node along the transmission path. The model maintains a
forwarding state for each flow, so it has a poor extensibility.
There are millions of flows on the Internet, which consume a
large number of device resources. Therefore, this model is not
widely used. In recent years, IETF has modified the RSVP
protocol, and defines that RSVP can be used together with the
DiffServ model, especially in the MPLS VPN field. Therefore,
RSVP has a new improvement. However, this model still has
not been widely used. THe DiffServ model addresses
problems in the IntServ mode, so the DiffServ model is a
widely used QoS technology.
DiffServ model
The IntServ has a poor extensibility. After 1995, SPs and
research organizations developed a new mechanism that
supports various services. This mechanism has a high
extensibility. In 1997, IETF recognized that the service model
in use is not applicable to network operation, and there should
be a way to classify information flows and provide
differentiated service for users and applications. Therefore,
IETF developed the DiffServ model, which classifies flow on
the Internet and provides differentiated service for them. The
DiffServ model supports various applications and is applicable
to many business models.

Precedence field
The 8-bit Type of Service (ToS) field in an IP packet header
contains a 3-bit IP precedence field.
Bits 0 to 2 constitute the Precedence field, representing
precedence values 7, 6, 5, 4, 3, 2, 1 and 0 in descending order
of priority. The highest priorities (values 7 and 6) are reserved
for routing and network control communication updates. Userlevel applications can use only priority values 0 to 5. Bits 6 and
7 are reserved.
Apart from the Precedence field, a ToS field also contains the
D, T, and R sub-fields:
Bit D indicates the delay. The value 0 represents a
normal delay and the value 1 represents a short delay.
Bit T indicates the throughput. The value 0 represents
normal throughput and the value 1 represents high
throughput.
Bit R indicates the reliability. The value 0 represents
normal reliability and the value 1 represents high
reliability.

DSCP field
RFC 2474 redefines the TOS field. The right-most 6 bits
identify service type and the left-most 2 bits are reserved.
DSCP can classify traffic into 64 categories.

Each DSCP value matches a Behavior Aggregate (BA) and


each BA matches a PHB (such as forward and discard), and
then the PHB is implemented using some QoS mechanisms
(such as traffic policing and queuing technologies).
DiffServ network defines four types of PHB: Expedited
Forwarding (EF), Assured Forwarding (AF), Class Selector
(CS), and Default PHB (BE PHB). EF PHB is applicable to the
services that have high requirements on delay, packet loss,
jitter, and bandwidth. AF PHBs are classified into four
categories and each AF PHB category has three discard
priorities to specifically classify services. The performance of
AF PHB is lower than the performance of EF PHB. CS PHBs
originate from IP TOS, and are classified into 8 categories. BE
PHB is a special type in CS PHB, and does not provide any
guarantee. Traffic on IP networks belongs to this category by
default.
Priority mapping configuration
Configure the trusted packet priorities: Run the trust command
to specify the packet priority to be mapped.
Configure the priority mapping table: Run the qos map-table
command to enter the 802.1p or DSCP mapping table view,
and run the input command to set the priority mappings.

Token bucket
A token bucket with a certain capacity stores tokens. The
system places tokens into a token bucket at the configured
rate. When the token bucket is full, excess tokens overflow
and no token is added.
A token bucket forwards packets according to the number of
tokens in the token bucket. If there are sufficient tokens in the
token bucket for forwarding packets, the traffic rate is within
the rate limit. Otherwise, the traffic rate is not within the rate
limit.
Single-rate-single-bucket
A token bucket is called bucket C. Tc indicates the number of
tokens in the bucket. Single-rate-single-bucket has two
parameters:
Committed Information Rate (CIR): indicates the rate of
putting tokens into bucket C, that is, the average traffic
rate permitted by bucket C.
Committed Burst Size (CBS): indicates the capacity of
bucket C, that is, the maximum volume of burst traffic
allowed by bucket C each time.
The system places tokens into the bucket at the CIR. If Tc is
smaller than the CBS, Tc increases; otherwise, Tc does not
increase.
B indicates the size of an arriving packet:
If B is smaller than or equal to Tc, the packet is colored
green, and Tc decreases by B.
If B is greater than Tc, the packet is colored red, and
Tc remains unchanged.

Single-Rate-Double-Bucket
Two token buckets are available: bucket C and bucket E. Tc and Te
indicate the number of tokens in the bucket. Single-rate-double-bucket
has three parameters:
Committed Information Rate (CIR): indicates the rate of
putting tokens into bucket C, that is, the average traffic
rate permitted by bucket C.
Committed Burst Size (CBS): indicates the capacity of
bucket C, that is, the maximum volume of burst traffic
allowed by bucket C each time.
Excess Burst Size (EBS): indicates the capacity of
bucket E, that is, the maximum volume of excess burst
traffic allowed by bucket E each time.
The system places tokens into the buckets at the CIR:
If Tc is smaller than the CBS, Tc increases.
If Tc is equal to the CBS and Te is smaller than the
EBS, Te increases.
If Tc is equal to the CBS and Te is equal to the EBS,
Tc and Te do not increase.
B indicates the size of an arriving packet:
If B is smaller than or equal to Tc, the packet is colored
green, and Tc decreases by B.
If B is greater than Tc and smaller than or equal to Te,
the packet is colored yellow and Te decreases by B.
If B is greater than Te, the packet is colored red, and
Tc and Te remain unchanged.

Double-Rate-Double-Bucket
Two token buckets are available: bucket P and bucket C. Tp and Tc
indicate the number of tokens in the bucket. Double-rate-double-bucket
has four parameters:
Peak information rate (PIR): indicates the rate at which
tokens are put into bucket P, that is, the maximum
traffic rate permitted by bucket P. The PIR must be
greater than the CIR.
Committed Information Rate (CIR): indicates the rate of
putting tokens into bucket C, that is, the average traffic
rate permitted by bucket C.
Peak Burst Size (PBS): indicates the capacity of bucket
P, that is, the maximum volume of burst traffic allowed
by bucket P each time. PBS is greater than CBS.
Committed Burst Size (CBS): indicates the capacity of
bucket C, that is, the maximum volume of burst traffic
allowed by bucket C each time.
The system places tokens into bucket P at the rate of PIR and
places tokens into bucket C at the rate of CIR:
If Tp is smaller than the PBS, Tp increases. If Tp is
greater than or equal to the PBS, Tp remains
unchanged.
If Tc is smaller than the CBS, Tc increases. If Tc is
greater than or equal to the CBS, Tc remains
unchanged.

B indicates the size of an arriving packet:


If B is greater than Tp, the packet is colored red.
If B is greater than Tc and smaller than or equal to Tp,
the packet is colored yellow and Tp decreases by B.
If B is smaller than or equal to Tc, the packet is colored
green, and Tp and Tc decrease by B.

Traffic policing discards excess traffic to limit traffic within a proper


range and to protect network resources and enterprises' interests.
Traffic policing consists of:
Meter: measures the network traffic using the token bucket
mechanism and sends the measurement result to the marker.
Marker: colors packets in green, yellow, or red based on the
measurement result received from the meter.
Action: takes actions based on packet coloring results (packets in
green or yellow are forwarded and packets in red are discarded by
default) received from the marker. The following actions are defined:
Pass: forwards the packets that meet network
requirements.
Remark + pass: changes the local priorities of packets
and forwards them.
Discard: discards the packets that do not meet network
requirements.

If the rate of a type of traffic exceeds the threshold, the device lowers
the packet priority and then forwards or directly discards the packets.
By default, these packets are discarded.
Traffic policing commands:
Configure interface-based traffic policing: Run the qos car
command to create a QoS CAR profile and configure QoS CAR
parameters. The parameters in the command vary when the command
is executed on a WAN interface and a LAN interface.
Configure rate limiting on WAN interface: Run the qos lr command
to set the ratio of packet rate sent by a physical interface to the total
interface bandwidth.

Traffic policing discards excess traffic to limit traffic within a proper


range and to protect network resources and enterprises' interests.
Traffic shaping process:
When packets arrive, the device classifies packets into different
types and places them into different queues.
If the queue that packets enter is not configured with traffic shaping,
the packets are immediately sent. Packets requiring queuing proceed
to the next step.
The system places tokens to the bucket at the specified rate (CIR):
If there are sufficient tokens in the bucket, the device
forwards the packets and the number of tokens
decreases.
If there are insufficient tokens in the bucket, the device
places the packets into the buffer queue. When the
buffer queue is full, packets are discarded.
When there are packets in the buffer queue, the system extracts the
packets from the queue and sends them periodically. Each time the
system sends a packet, it compares the number of packets with the
number of tokens till the tokens are insufficient to send packets or all
the packets are sent.
Traffic shaping commands:
Configure interface-based traffic shaping: Run the qos gts
command to configure traffic shaping on the interface.

Configure queue-based traffic shaping.


Run the qos queue-profile queue-profile-name
command to create a queue profile and display the
queue profile view.
Run the queue { start-queue-index [ to end-queueindex ] } &<1-10> length { bytes bytes-value | packets
packets-value } command to set the length of each
queue.
Run the queue { start-queue-index [ to end-queueindex ] } &<1-10> gts cir cir-value [ cbs cbs-value ]
command to configure queue-based traffic shaping. By
default, traffic shaping is not performed for queues.
Run the qos queue-profile queue-profile-name
command to apply the queue profile to an interface.

If the rate of incoming packets on an interface is higher than the rate of


outgoing packets, the interface is congested. If there is insufficient
space for storing the packets, some packets are discarded. When
packets are discarded, hosts or routers retransmit the packets, leading
to a vicious circle.
When congestion occurs, multiple packets preempt resources. The
packets that cannot obtain resources are discarded. The bandwidth,
delay, and jitter of key services cannot be ensured. The core of
congestion management is to decide the resource scheduling policy
that specifies the packet forwarding sequence. Generally, devices use
the queue technology to cope with congestion. The queue technology
involves queue creation, traffic classifier, and queue scheduling.
Initially, there is only one queue scheduling policy, that is, First-in-Firstout. To meet different service requirements, more scheduling policies
are developed.
Queue scheduling mechanisms include hardware queue scheduling
and software queue scheduling. Hardware queue is also called transmit
queue (TxQ). The interface drive uses this queue when transmiting
packets one by one. The hardware queue is a FIFO queue. Software
queue schedules data packets to hardware queue according to QoS
requirements. It can use multiple scheduling methods.
Data packets enter the software queue only when the hardware queue
is full.

The hardware queue length depends on the bandwidth setting on the


interface. If the interface bandwidth is high, transmission delay is short,
so queue length can be long. An appropriate hardware queue length is
important. If the hardware queue length is too long, the policy execution
performance of the software queue degrades because the hardware
queue uses the FIFO mechanism for scheduling. If the hardware queue
length is too short, scheduling efficiency is low, link use efficiency is low,
and the CPU usage is high.
LAN ports support the FQ and WRR queues.
WAN ports support the FQ and WFQ queues.
Configuration commands:
Run the qos queue-profile queue-profile-name command to
create a queue profile and display the queue profile view.
On the WAN-side interface, run the schedule{ { pq startqueue-index [ to end-queue-index ] } | {wfq start-queue-index
[ to end-queue-index ] } command to set a scheduling mode
for each queue on the WAN-side interface.
On the LAN-side interface, run the schedule{ { pq startqueue-index [ to end-queue-index ] } | { drr start-queue-index
[ to end-queue-index ] } | {wrr start-queue-index [ to endqueue-index ] } command to set a scheduling mode for each
queue on the LAN-side interface.
Run the qos queue-profile queue-profile-name command to
apply the queue profile to an interface.

FIFP characteristics:
Advantages:
Simple
Disadvantages:
Unfair and no separation between flows. A large flow
will occupy the bandwidth of other flows, which
prolongs the delay of other flows.
When congestion occurs, FIFO discards some packets.
When TCP detects packet loss, it lowers transmission
speed to avoid congestion. However, UDP does not
lower transmission speed because it is a
connectionless protocol. As a result, the TCP and UDP
packets in FIFO are not equally processed. The TCP
packet rate is too low.
A flow may occupy all the buffer space and blocks
other types of traffic.

RR

Advantages:
Different flows are separated, and bandwidth is equally
allocated to queues.
Available bandwidth is equally allocated to other
queues.
Disadvantages:
Weights cannot be configured for the queues.
When queues have different packet lengths,
scheduling is inaccurate.
When scheduling rate is low, delay and jitter indicators
will deteriorate. For example, when a packet arrives at
an empty queue that is just scheduled, this packet can
be processed only when all the other queues are
scheduled. In this situation, jitter is serious. However, if
scheduling rate is high, the delay is short. The RR
mode is widely used on high-speed routers.

Compared with RR, WRR can set the weights of queues. During the
WRR scheduling, the scheduling chance obtained by a queue is in
direct proportion to the weight of the queue. During the WRR
scheduling, the empty queue is directly skipped. Therefore, when there
is a small volume of traffic in a queue, the remaining bandwidth of the
queue is used by the queues according to a certain proportion.
Advantages:
Bandwidth is allocated based on weights, and the
remaining bandwidth of a queue is equally allocated to
other queues. Low-priority queues are also scheduled
in a timely manner.
It is easy to implement.
Applicable to DiffServ ports.
Disadvantages:
Similar to RR, WRR is inaccurate when queues have
different packet lengths.
When scheduling rate is low, packet delay is unstable
and the delay and jitter indicators cannot be lowered to
the expected values.

PQ

PQ has four-level queues, including Top, Middle, Normal, and


Bottom. However, most devices support eight-level queues.
Packets in queues with a low priority can be scheduled only
after all packets in queues with a high priority have been
scheduled. Therefore, PQ has obvious advantages and
disadvantages.
PQ ensures that the packets in high-priority queues obtain
high bandwidth, low delay and jitter; however, the packets in
low-priority queues cannot be scheduled in a timely manner or
even cannot be scheduled. As a result, the lower-priority
queues starve out.
PQ has the following characteristics:
Uses ACL to classify packets into different types and
adds packets to the corresponding queues.
Packets are discarded only by using the Tail Drop
mechanism.
When the queue length is set to 0, the queue length
can be infinite. That is, the packets entering this queue
are not discarded by Tail Drop unless the memory
space is exhausted.
The FIFO logic is used internal the queue.
The packets in low-priority queues are scheduled only
after all packets in high-priority queues are scheduled.
PQ ensures high quality for specified service traffic, but does
not care about the quality of other services.

Advantages:
Precisely controls the delay of high-priority queues.
Easy to implement, differentiating services
Disadvantages:
Cannot allocate bandwidth as required. When highpriority queues have many packets, the packets in lowpriority queues cannot be scheduled.
It shortens the delay of high-priority queues by
compromising the service quality of low-priority queues.
If a high-priority queue transmits TCP packets and a
low-priority queue transmits UDP packets, the TCP
packets are transmitted at a high speed, while UDP
packets cannot obtain sufficient bandwidth.

CQ

The number of bytes to be scheduled must be specified for


each queue. A packet can be scheduled only when its length
exceeds the specified byte size. If the configured byte size is
too small, the queue may be congested. If the configured byte
size is small, bandwidth allocation is inaccurate. For example,
500 bytes is specified for a queue, while most packets in the
queue exceed 1000 bytes. Therefore, the bandwidth actually
allocated is higher than the expected bandwidth. If the number
of bytes specified is large, it is difficult to control the delay. CQ
can schedule multiple packets each time. The number of
packets to be scheduled is the same as the number of packets
that can be accommodated by the bytes scheduled each time.
Advantages:
Allocates bandwidth according to certain percentages.
When the traffic volume of a queue is small, other
queues can occupy the bandwidth of this queue.
Easy to implement
Disadvantages:
When the specified number of bytes is small,
bandwidth allocation is inaccurate. When the specified
number of bytes is large, delay and jitter are serious.

WFQ

Weighted Fair Queuing (WFQ) classifies packets by flow. On


an IP network, the packets with the same source IP addresses,
destination IP addresses, protocol numbers, and IP
precedence belong to the same flow. On an MPLS network,
the packets with the same labels and EXP fields belong to the
same flow. WFQ assigns each flow to a queue, and tries to
assign different flows to different flows. When packets leave
the queues, WFQ allocates the bandwidth on the outbound
interface for each flow according to the weights. The smaller
the weight value of the flow is, the smaller the bandwidth the
flow obtains. The greater the weight value of the flow is, the
greater the bandwidth the flow obtains. In this manner,
services of the same priority are treated equally; services of
different priorities are allocated with different weight.
For example, there are eight flows on the interface, with
weights as 1, 2, 3, 4, 5, 6, 7, and 8 respectively. The total
bandwidth quota is the sum of weights, that is, 1 + 2 + 3 + 4 +
5 + 6 + 7 + 8 = 36. The bandwidth occupied by each flow is:
Weight of each flow/Total bandwidth quota. That is, flows
obtain the bandwidth of 1/36, 2/36, 3/36, 4/36, 5/36, 6/36, 7/36,
and 8/36. Thus, WFQ assigns different scheduling weights to
services of different priorities while ensuring fairness between
services of the same priority.
Advantages:

The queues are scheduled fairly based on the


granularity of bytes.
Differentiates services and allocates weights.
Properly controls delay and reduces jitter.
Disadvantages:
Difficult to implement.

Congestion Avoidance
Tail drop is a traditional method in the congestion avoidance
mechanism. When the length of a queue reaches the
maximum value, all the packets are discarded. If too many
TCP packets are dropped, TCP times out. This may result in
slow TCP start and trigger the congestion avoidance
mechanism so that the device slows down the transmission of
TCP packets. When queues drop several TCP-connection
packets at the same time, these TCP connections start
congestion avoidance and slow startup, which is referred to as
global TCP synchronization. Thus, these TCP connections
simultaneously send fewer packets to the queue so that the
rate of incoming packets is smaller than the rate of outgoing
packets, reducing the bandwidth usage. Moreover, the volume
of traffic sent to the queue varies greatly from time to time. As
a result, the volume of traffic over the link fluctuates between
the bottom and the peak. The delay and jitter of certain traffic
are affected.
The traditional packet loss policy uses the tail drop method.
When the queue length reaches the upper limit, the excess
packets (buffered at the queue tail) are discarded.
To prevent global TCP synchronization, Random Early
Detection (RED) is used. The RED technique randomly
discards packets to prevent the transmission speed of multiple
TCP connections from being reduced simultaneously. The
TCP rate and network traffic volume thus are stable.

The device provides Weighted Random Early Detection


(WRED) based on RED technology. WRED discards packets
in queues based on DSCP field or IP precedence. The upper
drop threshold, lower drop threshold, and drop probability can
be set for each priority. When the number of packets of a
priority reaches the lower drop threshold, the device starts to
discard packets. When the number of packets reaches the
upper drop threshold, the device discards all the packets. A
higher threshold indicates a high drop probability. The
maximum drop probability cannot exceed the upper drop
threshold. WRED discards packets in queues based on the
drop probability, thereby relieving congestion.
WRED configuration:
Configure a drop profile.
Run the drop-profile drop-profile-name
command to create a drop profile and enter the
drop profile view.
Run the dscp{ dscp-value1 [ to dscp-value2 ] }
&<1-10> low-limit low-limit-percentage highlimit high-limit-percentage discard-percentage
discard-percentage command to set DSCPbased WRED parameters.
Run the ip-precedence { ip-precedence-value1
[ to ip-precedence-value2 ] } &<1-10> low-limit
low-limit-percentage high-limit high-limitpercentage discard-percentage discardpercentage command to set IP precedencebased WRED parameters.
Apply the drop profile.
Run the qos queue-profile queue-profile-name
command to enter the queue profile view.
Run the schedule wfq start-queue-index [ to
end-queue-index ] command to set the
scheduling mode of a queue to WFQ.
Run the queue { start-queue-index [ to endqueue-index ] } &<110> drop-profile dropprofile-name command to bind a drop profile to
a queue in a queue profile.
Run the qos queue-profile queue-profile-name
command to apply the queue profile to an
interface.

Traffic classification is used to identify the packets with certain


characteristics according to a rule, and is the prerequisite and basis for
differentiated services. You can define rules to classify packets and
specify the relationships between rules:
AND: Packets match a traffic classifier only when the packets
match all the rules. If a traffic classifier contains ACL rules,
packets match the traffic classifier only when the packets
match one ACL rule and all the non-ACL rules. If a traffic
classifier does not contain ACL rules, packets match the traffic
classifier only when the packets match all the non-ACL rules.
OR: Packets match a traffic classifier as long as the packets
match a rule.
A traffic behavior refers to an action taken for packets. Performing
traffic classification is to provide differentiated services. A
traffic classifier takes effect only when it is associated with a
traffic control action or a resource allocation action.
A traffic policy is configured by binding traffic classifiers to traffic
behaviors. After a traffic policy is applied to an interface,
globally, to a board, or to a VLAN, differentiated service is
provided.
Traffic policy configuration commands

Configure a traffic classifier.


Run the traffic classifier classifier-name [ operator
{ and | or } ] command to create a traffic classifier and
enter the traffic classifier view.
Configure a traffic behavior.
Run the traffic behavior behavior-name command to
create a traffic behavior and enter the traffic behavior
view.
Configure a traffic policy.
Run the traffic policy policy-name command to create
a traffic policy and enter the traffic policy view.
The classifier behavior command binds a traffic
behavior to a traffic classifier to a traffic behavior in a
traffic policy.
Run the traffic-policy policy-name { inbound | outbound }
command to apply a traffic policy to the interface or subinterface in the inbound or outbound direction.

SNMP model
NMS station is the manager in a network management system. It
uses the SNMP protocol to manage and monitor the network. The NMS
software runs on an NMS server.
Agent is a process on the managed device. The agent maintains data
on the managed device, receives and processes the request packets
from the NMS, and then sends the response packets to the NMS.
Management object is the object to be managed. A device may have
multiple management objects, including a hardware component (such
as an interface board) and parameters (such as a routing protocol)
configured for the hardware or software.
MIB is a database specifying variables that are maintained by the
managed device and can be queried or set by the agent. MIB defines
attributes of the managed device, including the name, status, access
rights, and data type of objects.

Operations of SNMPv1 and SNMPv2c


Get: reads one or several parameter values from the MIB of the agent
process.
GetNext: reads the next parameter value from the MIB of the agent
process.
Set: sets one or several parameter values in the MIB of the agent
process.
Response: returns one or more queried values. The agent performs
this operation that corresponds to the GetRequest, GetNextRequest,
SetRequest, and GetBulkRequest operations. Upon receiving a Get or
Set request, the agent performs the Query or Modify operation using
MIB tables and then sends the responses to the NMS.
Trap: sent by an agent process to notify the NMS of a fault or event
on the managed device.
New Operation Types of SNMPv2c
GetBulk: The NMS queries managed devices in batches. It is
implemented based on the GetNext operation. A GetBulk operation
equals to a series of GetNext operations. You can specify the number
of times the GetNext operation is executed on the managed device
during a GetBulk interaction.
InformRequest: sent by a managed device to notify the NMS of an
alarm on a managed device. After the managed device sends an inform,
the NMS must send an InformResponse packet to the managed device.

Operations related to SNMPv3:


The NMS sends a Get request without security parameters to the
agent.
The agent responds and returns requested parameters to the NMS.
The NMS sends a Get request carrying security parameters to the
agent.
The agent encrypts response packet and returns required parameters
to the NMS.

NQA Principles
Creating a test instance
NQA requires two test ends, an NQA client and an
NQA server (or called the source and destination). The
NQA client (or the source) initiates an NQA test. You
can configure test instances through command lines or
the NMS. Then NQA places the test instances into test
queues for scheduling.
Starting the test instance
When starting an NQA test instance, you can choose to
start the test instance immediately, at a specified time,
or after a delay. A test packet is generated based on
the type of a test instance when the timer expires. If the
size of the generated test packet is smaller than the
minimum size of a protocol packet, the test packet is
generated and sent out with the minimum size of the
protocol packet.
Processing a test instance
After a test instance starts, the protocol-related running
status can be collected according to response packets.
The client adds a timestamp to a test packet based on
the local system time before sending the packet to the
server. After receiving the test packet, the server sends
a response packet to the client. The client then adds a
timestamp to the received response packet based on
the current local system time. This helps the client
calculate the round-trip time (RTT) of the test packet
based on the two timestamps.

An NQA ICMP test instance checks whether a route from the NQA
client to the destination is reachable. The ICMP test has a similar
function as the ping command, while the ICMP test provides more
output information:
By default, the command output shows the results of the latest five
tests.
The output includes the average delay, the packet loss ratio, and the
time the last packet is correctly received.

Test Procedure
Source (R1) sends an ICMP echo request packet to the destination
(R2).
After receiving the ICMP echo request packet, the destination (R2)
responds to the source (R1) with an ICMP echo reply packet.
The source (R1) then can calculate the time of communication
between the source (R1) and the destination (R2) by subtracting the
time the source sends the ICMP echo request packet from the time the
source receives the ICMP echo reply packet. The calculated data can
reflect the network performance and operating status.

NTP synchronization process


R1 sends an NTP packet to R2. The packet carries a timestamp,
10:00:00 am (T1), indicating the time it leaves R1.
When the NTP packet reaches R2, R2 adds a timestamp, 11: 00:01
am (T2), to the NTP packet, indicting the time R2 receives the packet.
When the NTP packet leaves R2, R2 adds a transmit timestamp,
11:00:02 am (T3), to the NTP packet, indicating the time it leaves R2.
When R1 receives this response packet, it adds a new receive
timestamp, 10:00:03 am (T4), to the packet. R1 uses the received
information to calculate the following two important parameters:
Roundtrip delay of the NTP packet: Delay = (T4 - T1) (T3 - T2)
Clock offset of R1 by taking R2 as a reference: Offset =
((T2 - T1) + (T3 - T4))/2
After the calculation, R1 knows that the roundtrip delay is 2 seconds
and the clock offset of R1 is 1 hour. R1 sets its own clock based on
these two parameters to synchronize its clock with that of R2.